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Abstract 


The Council of Chief State School Officers (CCSSO) was awarded a grant from the National 
Science Foundation to conduct a meta analysis study with the goal of providing state and local 
education leaders with scientifically-based evidence regarding the effects of teacher professional 
development on improving student learning. The analysis focused on completed studies of 
effects of professional development for K-12 teachers of science and mathematics. The meta 
analysis results show important cross-study evidence that teacher professional development in 
mathematics does have significant positive effects on student achievement. The analysis results 
also confirm the positive relationship to student outcomes of key characteristics of design of 
professional development programs. 
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Effects of Teacher Professional Development on Gains in Student Achievement: How Meta 
Analysis Provides Scientific Evidence Useful to Education Leaders 


In the present education policy environment a high priority has been placed on improving teacher 
quality and teaching effectiveness in U.S. schools (Darling-Hammond et ah, 2009; Obama, 

2009). Standards-based educational improvement requires teachers to have deep knowledge of 
their subject and the pedagogy that is most effective for teaching the subject. States and school 
districts are charged with establishing and leading professional development programs, some 
with federal funding support, which will address major needs for improved preparation of 
teachers. The whole issue of teacher quality, including teacher preparation and ongoing 
professional development, and improving teacher effectiveness in classrooms, is at the heart of 
efforts to improve the quality and performance of our public schools. 

The Council of Chief State School Officers (CCSSO) has led recent initiatives designed to 
identify, analyze and disseminate important findings from research and evaluation studies of 
teacher professional development. Our goal is help K-12 education decision-makers base their 
decisions on programs using best evidence of effectiveness (Blank, et al, 2007; 2008; 
http://www.ccsso.org/proiects/improving evaluation of professional development) . In 2006, 
CCSSO was awarded a grant from the National Science Foundation (NSF) to conduct a meta 
analysis study with the goal of providing state and local education leaders with scientifically- 
based evidence regarding the effects of teacher professional development on improving student 
learning. The analysis has focused on completed studies of effects of professional development 
for teachers of science and mathematics. The two-year study was designed to measure and 
summarize consistent, systematic findings across multiple studies that show significant effects of 
teacher professional development on student achievement gains in K-12 mathematics or science. 
The present paper provides a summary of findings from the CCSSO meta analysis. In the paper 
we describe the studies that met the criteria for inclusion in the meta analysis, and explain the 
steps in the meta analysis methodology as applied in this area of education research. Meta 
analysis is frequently used in fields such as medicine, mental health, and criminal justice to 
confirm and validate findings across studies. Our paper helps to demonstrate why effect sizes 
and meta analysis are important for comparison of findings across education research and 
evaluation studies to adequately determine the quality and significance of evidence concerning a 
key education policy issue, such as designing and implementing teacher professional 
development. 


State Education Leader Needs for Research Evidence 

State education agencies are responsible for directing and managing the use of federal funds for 
teacher development and improvement as well as guiding programs supported by states. 
Additionally, states are now required under NCLB to report on the qualifications of teachers in 
core academic subjects and the proportion of teachers that receive high quality professional 
development each year. Finally, state education agencies provide leadership for local systems on 
how to design, select, and implement professional development for teachers. Strong, research- 
based program designs, and evidence on their effects, are now in high demand across the U.S. 
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State responsibilities for administering, designing, evaluating, and reporting on federally 
supported and state-funded programs for improving teaching and teacher quality provide a strong 
rationale for the proposed work by CCSSO to lead a meta analysis study of effects of well- 
designed professional development programs. States and in turn local districts seek models for 
designing and implementing effective professional development and particularly models 
supported by research evidence. 

The CCSSO meta analysis study of effects of professional development with mathematics and 
science teachers is important for state education leaders because of four intersecting trends that 
are now strongly affecting education policy, data, and research. 

1) Federal legislation. NCLB pushes for use of scientifically-based research in program 
decisions and evaluation of effectiveness of programs. 

2) Student achievement as the preferred measure of effects of programs. The 

increasing interest within the education community and from policymakers for measuring 
effectiveness of initiatives by evidence of gains in student achievement, partly due to the 
improved capacity of data systems to relate programs to student outcomes. 

3) Recent research findings. A large body of research has identified the design and 
features of professional development for teachers which will be more likely to produce 
effects on student learning. 

4) State leadership needed with teacher development resources. Typically, we see a 
small state policy role in the design and evaluation of professional development, and 
local program designs are not often based on research evidence and thus may be lacking 
coherent or consistent focus. 

Federal legislation supporting funding for K-12 public education under No Child Left Behind 
(NCLB) has produced a strong push toward application of results from scientifically based 
research in education program decisions and methods of evaluation. NCLB regulations call for 
programs that have been proven effective through scientifically-based research (Shavelson & 
Towne, 2002). In implementing NCLB through the several Title programs, the U.S. Department 
of Education has advocated for program evaluations that are based on experimental designs. A 
challenge for state education agencies has been to carry out their legislated function of directing 
federally funded programs for teacher improvement that meet criteria for quality as specified 
under NCLB Title II (Birman, et al, 2007). States have also been challenged in determining how 
to encourage and fund evaluation studies that use experimental designs, especially those with 
random control trials, and would meet the goal of providing scientific evidence of the effects of 
teacher-focused improvement efforts on improving the achievement of students they teach 
(Noyce, 2006; Coalition for Evidence-Based Policy, 2003). 

Under the Title IIB Math-Science Partnership program of NCLB, program grants are awarded by 
state competitions. State education agencies are responsible for ensuring that programs include 
scientifically-based evaluations of program outcomes as well as reporting program results to the 
U.S. Department of Education. Reviews of existing program evaluations indicate that most 
professional development programs for math and science teachers are not evaluated with 
experimental designs (CCSSO, 2006; Frechtling, 2001). States and districts currently have very 
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limited capacity for relating pre-service teacher preparation or professional development to 
student outcomes (Carey, 2004). 

Student achievement as the preferred outcome measure. Education research that measures 
effects of improving teacher preparation and development of teacher knowledge and skills on 
change in student achievement has developed and expanded since the 1990s. Kennedy carried 
out one of the first reviews of research on the relationship of quality of teacher preparation to 
subsequent student achievement a decade ago (1998). At that time, she identified a relatively 
small number of research studies that were able to draw a direct link between the level of teacher 
preparation in their teaching field and achievement of students. Darling-Hammond (1999) 
analyzed large-scale assessment data across the states, and her research results showed that 
teacher preparation in field was positively related to student achievement. These study findings 
resulted in extensive policy and research debate, that still continues, about the importance of 
formal teacher preparation and qualifications, including teacher certification. 

More recently, several major research synthesis projects have broadly analyzed evidence on the 
effects of mathematics and science teacher preparation and development initiatives on student 
achievement. One approach to reviewing evidence across studies is to apply a logic model and 
to examine the relationship of teacher preparation on student achievement through effects on 
intervening variables such as teacher knowledge and instructional practices (Clewell et ah, 2004; 
Ingvarson, Meiers & Beavis, 2005). This kind of full analytic model allows educators and 
leaders to identify key decisions about the organization, delivery and support of teacher 
development that are ingredients to positive outcomes. 

Another approach to research synthesis analysis is to specifically define teacher professional 
development initiatives and to identify those studies which reveal effects on student achievement 
directly linked to the initiative. In research for the Southwest Regional Education Lab, Yoon and 
colleagues (2007) reviewed findings from several thousand studies on the effects of teacher 
professional development programs and initiatives to detennine evidence of effects on student 
achievement. The synthesis identified relevant findings by applying the ED/IES What Works 
Clearinghouse criteria for experimental design and measuring effect size. This synthesis 
identified nine studies that met the criteria in the published research literature, and all nine 
studies were based on small numbers of teachers and measurement of change with achievement 
tests closely aligned to the treatment model. A new paper by Wayne, Yoon and AIR colleagues 
(2008) describes in detail how experimental designs can be used to analyze outcomes from 
teacher preparation and development. 

Recent research on effective teacher development. A large body of education research has 
been published over the past decade which provides a base of knowledge about the 
characteristics of effective programs of teacher professional development in mathematics and 
science. The rationale for recent federal policy toward teacher professional development through 
NCLB and through NSF education programs has cited findings from research documenting 
characteristics of initiatives for teacher development that were proven effective in improving 
teaching (Garet et ah, 1999; Hiebert, 1999; Loucks-Horsley et ah, 1998; Corcoran & Foley, 

2003; National Commission on Teaching & America’s Future, 1996; Weiss, et ah, 2001; 

Guskey, 2003; Showers, Joyce, & Bennett, 1987; Supovitz, 2003). There is also extensive 
published research focusing on the role of teacher knowledge and skills in student learning, the 
kinds of knowledge teachers need, and what knowledge is critical to effective teaching (e.g., Ball 
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& Bass, 2000; Borko, 2004; Hill, Schilling & Ball, 2004; Wilson & Berne, 1999; Hill, Schilling 
& Ball, 2004; Ball & Bass, 2000). 


Although there has been strong research evidence that could contribute to improving teacher 
professional development methods and delivery, there still exists a significant gap in translating 
research into practice. Results from large-scale national studies early in this decade indicate that 
most professional development initiatives for teachers are not designed to meet the key 
characteristics of effectiveness we now recognize from research (Corcoran & Foley, 2003; Garet 
et ah, 2001; Desimone et ah, 2002; Corcoran & Foley, 2003Garet et al., 2001). 

Improve state leadership. The current state role in setting policies and providing leadership for 
high quality professional development is weak in many states — that is, states may provide broad 
guidance but leave the definition, design and delivery of programs and teacher development 
services to districts, regional service agencies, or other providers (Corcoran, 2007). Currently 
most program decisions are left to school leaders or to individual teachers regarding types of 
professional development, course credits for re-licensure, or pay and promotion. The existence 
of different levels of responsibility for professional development and multiple sources of funding 
have produced a fragmented, non- targeted system of development of teachers (Binnan & Porter, 
2002; Choy et al., 2006; Correnti, 2007; Choy et al., 2006; Hezel Associates, LLC, 2007; Birman 
& Porter, 2002). 

Miles, Odden, Fenmanich, and Archibald (2004) studied the total costs of professional 
development across a large sample of districts and found that an average of $4,380 is spent 
annually per teacher. Case studies of six districts indicate mixed results from investments in 
professional development (Chambers et al., 2008). Education systems are allocating extensive 
funds to professional development. While most teachers do receive some professional 
development each year, measurable effects are hard to demonstrate due to lack of consistency, 
content focus and coherence among the professional development activities provided. 

States can improve the use of resources and increase their policy role with teacher professional 
development through reference to findings from research and evaluation. Research on the state 
role in teacher development has been mostly limited to case studies of specific state initiatives or 
policies, and organizational characteristics related to program delivery (e.g., Cohen & Hill, 

2001). Teacher education and professional development programs conducted by institutions or 
providers supported by states and districts are evaluated as separate entities, and evaluation 
criteria and methods are diverse. Policymakers thus find it difficult to gain a comprehensive 
picture of what works best in improving teacher skills and knowledge or even what effect 
different amounts of coursework or pre-service education make a difference in improving 
teaching. 

Also, states now have better access to data for measuring effects of programs on student 
achievement. NCLB did and still continues to provide funding and support for statewide 
integrated data systems with student and teacher records that provide for longitudinal analysis of 
student achievement and measuring improvement from grade to grade, and about half the states 
have received competitive grants to improve longitudinal data systems (National Center for 
Education Statistics, n.d.). States and districts are in a better position to employ large data bases 
to analyze the effects of specific program interventions, such as teacher professional 
development, than they were even three years ago. Now, analysis of state data from education 


4 


CCSSO, Effects of Teacher Professional Development: 2009 



information systems is supported by a new federally-supported center — National Center for 
Analysis of Longitudinal Data in Education Research or CALDER (Harris & Sass, 2007). 

Study Questions 

The CCSSO meta analysis study focused on identifying research from recent studies that 
measure effects of teacher professional development with a content focus on math or science. 

The meta analysis was carried out to address two primary questions: 

1) What are the effects of content-focused professional development for math and science 
teachers on improving student achievement as demonstrated across a range of studies? 

2) What characteristics of professional development programs (e.g., content focus, duration, 
coherence, active learning, and collective participation of teachers) explain the degree of 
effectiveness, and are the findings consistent with prior research on effective professional 
development (e.g., content focus, duration, coherence, active learning, and collective 
participation of teachers)? 

One goal of the present paper is to report on the results of the meta analysis which has been 
completed by the CCSSO study team. A second goal of the paper is to report on the use of meta 
analysis as a method for providing evidence for education decision-makers. The paper describes 
the methodology developed and carried out by the CCSSO team. With the current needs of 
education leaders for scientifically-based research evidence of program effects, we can report on 
the process for conducting this meta analysis as an important outcome of the study. The study 
results also include a set of common criteria for identifying studies demonstrating significant 
effects and how statistical procedures are used to establish acceptable effect sizes across a range 
of studies with varying treatments, sample sizes, and outcome measures. The paper will outline 
the meta analysis steps toward identifying accepted studies and effects, and then describe the 
important programmatic findings gained from the studies. 

The meta analysis study data collection follows the broad logic model for evaluating professional 
development developed in previous CCSSO projects (see Figure 1). In particular, the meta 
analysis study design centered on two areas: capturing the characteristics of the professional 
development models discussed in the studies, and documenting the resulting measurable student 
outcomes the studies attribute to the professional development programs. 
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Figure 1: Logic Model 
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Study Design 

The design for the CCSSO meta analysis built on prior studies in education (Borman et ah, 2002; 
Yoon et ah, 2007; Lipsey & Wilson, 2001) and applied it to findings about professional 
development across states and districts. The study design had four basic steps: 

1) identification and collection of potential studies, 

2) determination of study eligibility and conduct coding process, 

3) data analysis, and 

4) reporting and dissemination. 

Figure 2 illustrates the process in more detail. 

At the start of the CCSSO meta analysis, discussions with researchers from the American 
Institutes for Research (AIR) who were conducting a teacher professional development 
systematic review for the Southwest Regional Educational Laboratory (REL Southwest) 
precipitated adjustments in the literature search for the CCSSO study. Whereas the AIR-REL 
Southwest project focused on only published studies that cover reading/English language arts, 
mathematics and science from Australia, Canada, the United Kingdom and the United States, we 
widened our literature search to include unpublished works and yearly evaluation reports from 
ongoing projects. 

From May through November 2007, we conducted an intensive electronic search using the 
following databases and meta-databases: ERIC, PsycINFO, ProQuest, EBSCO host Academic 
Premier Search and Education Abstracts, Dissertation Abstracts, and the database of the 
Campbell Collaboration. Search words used included “professional development,” “staff 
development,” “math,” “science,” “research study,” and “student achievement.” We also 
reviewed the online database Teacher Qualifications and Quality of Teaching 
(http://ott.educ.msu.edu/tqqt/) . 
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Figure 2: Overview of the study design 
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In addition, searches were conducted targeting certain periodicals, namely, Review of 
Educational Research, Educational Evaluation and Policy Analysis, Education Policy Analysis 
Archives, TC Record, Journal of Research in Science Teaching, Science Education, Electronic 
Journal of Science Education, Research in Science & Technological Education, Journal of 
Science Education and Technology, Electronic Journal of Literacy Through Science, Taylor and 
Francis Group of scholarly periodicals, Journal of Chemical Education, ERS Spectrum, and 
School Science and Mathematics. Journals from associations such as the National Association 
for Research in Science Teaching, the Association for Science Education, and the American 
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Educational Research Association (AERA) were reviewed. With the latter, additional searches 
were conducted among the 2007 annual meeting abstracts to identify potential documents. 
CCSSO also examined the publications and databases of key research centers including RAND, 
Research for Better Schools, the Center for Research on the Education of Students Placed At 
Risk (CRESPAR), Consortium of Policy Research in Education (CPRE) and the Center for 
Comprehensive School Reform and Improvement. Lastly, CCSSO solicited principal 
investigators listed in USDOE Teacher Preparation Continuum, NSF MSP projects, IES funded 
projects, and Local Systemic Initiative (LSI) study sites. 

Cross-checks were also conducted using the prior reviews in teacher professional development. 
In particular, documents that were identified in the AIR-REL Southwest studies that passed its 
prescreening phase, reports that were included in the review conducted by Abt Associates 
(O’Reilly & Weiss, 2006; Scher & O’Reilly, 2007) and in the seminal Kennedy review (1998) 
were highlighted for inclusion. 

As a result, 416 reports were identified for pre-screening. A review of their abstracts eliminated 
82 percent or 342 reports because they were deemed irrelevant based on the pre-screening 
criteria (see Table 1). The remaining 74 documents were screened by a team of trained coders. 

Table 1 : Pre-Screening Criteria 


Criterion 

Description 

Topic Focus 

The document discusses the effects of inservice teacher professional 
development on student learning. 

Population Focus 

The study sample focused on teachers of mathematics and/or science 
and their students in grades K-12. 

Study Design 

The document discusses an empirical study. 

Outcomes 

The document must report direct student achievement outcomes, not 
distal student outcomes such as feelings, impressions or opinions from 
students about their learning. 

Time Frame 

The document had to be released between Jan. 1 , 1 986 and August 31 , 
2007. 

Country 

The study had to take place in the United States. 


Coding Form 

We adopted the coding form and reconciliation form used by AIR in their review (see Appendix 
A). The coding form was a systematic template that simultaneously assisted coders in 
classifying the pool of potential studies for inclusion as well as collect information from each 
study that was used in the meta analysis. The coding form appeared as an Excel file with 
multiple spreadsheets. A coder used the first spreadsheet to record his or her determination that 
the document 1) presents an empirical study with quantitative data on an in-service professional 
development program for teachers of math and/or science and includes student achievement 
outcomes; 2) uses a research design that produces valid and measurable results; 3) reports at least 
one effect size or provides sufficient data to compute at least one effect size; and 4) records some 
professional development characteristics. At each step (see Figure 3) the study was sorted for or 
against further consideration and inclusion. Subsequent spreadsheets in the file collect data that 
were used for the meta analysis: student and teacher outcome measures, sample sizes of teacher 
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and student populations, teacher and student characteristics, attributes of the professional 
development program or initiative, and statistical data from the study’s results that when entered 
will automatically compute effect sizes based on the student outcome measures. Information on 
the completed coding sheet was transferred to the reconciliation form which recorded input from 
both coders assigned to review the document. A third member reconciled any conflicting codes 
recorded by the coders and presented the final data through the reconciliation form to be entered 
for data processing and analysis. 

We recruited and trained a cadre of graduate students (mostly doctoral students in education and 
statistics) from Florida State University, George Mason University and George Washington 
University to code the 74 pre-screened documents. Initial full-day training was followed up a 
week later with a one-hour post-training session to gauge coders’ comfort level with the task on- 
hand and addressed any lingering general questions about the coding process. At the end of the 
full-day training session, coders were assigned to specific documents and to work in rotating 
teams of two. Coding and reconciling assignments are rotated throughout the coding process to 
maintain independent and unbiased reviews. Using a password-protected open source online 
portal run by Liferay called “Communities,” the coders worked remotely and asynchronously 
and posted their results onto a common area. Disputes in coding were settled by having an 
assigned reconciler who made final judgments for each question in the coding fonn. Questions 
and comments during the process of coding were conducting either over the phone, email, or in 
the communities comment page. 

Figure 3 illustrates the three stages of coding each pre-screened document underwent and the 
resulting documents that cleared each step. 
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Figure 3: Flow of Documents Reviewed and Included in the Meta Analysis Study 


Stacie 1, Pt. 1 

To determine if a document meets the following 
criteria: empirical quantitative research paper on 
an in-service PD program for teachers of math 
and/or science with student achievement 
outcomes 

Total # of Documents 

1A 

# of Documents Rejected 

41 

# of Documents Passed 

33 

Inter-rater Reliability Rate 
(=60/74) 

0.81 
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Stacie 1, Pt. 2 

To determine if a document's research design 
would validly produce measurable results 

Total # of Documents 

33 

# of Documents Rejected 

11 

# of Documents Passed 

22 

Inter-rater Reliability Rate 
(=30/33) 

0.91 
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Stacie II 

To determine if a document has enough data to 
compute an effect size or reports an effect size 

Total # of Documents 

22 

# of Documents Rejected 

2 

# of Documents Passed 

20 

Inter-rater Reliability Rate 
(=21/22) 

0.95 


I 


Post-codina 

To determine comparable effect sizes for each 
document for meta analysis 

Total # of Documents 

20 

# of Documents Rejected* 

4 

# of Documents Passed 

16 


The flow chart shows that 55 percent of the 74 documents that passed the pre-screening criteria 
failed at Stage I, Part 1, primarily because the documents did not meet the criteria. For example, 
one document that did not continue to the next round of screening focused more on a comparison 
of two curricula programs and not on professional development. 

A third of the documents failed to move on to Stage II, mainly for shortcomings in meeting the 
criteria for any of the four types of eligible research designs: randomized controlled trials 
(RCTs), quasi-experimental (QED), single-subject or regression discontinuity. For example, one 
document’s study was initially determined to be QED but provide scant description of variables 
by which the treatment and comparison group of teachers and their students were matched to be 
comparable. Another document failed because it reported a study that used an ex post facto 
(causal-comparative) research design and compared overall 4th and 6th grade scores at the 
district level from statewide assessments. 


10 


CCSSO, Effects of Teacher Professional Development: 2009 


In Stage II of the screening process, two out of the 22 documents were eliminated from further 
consideration. One document had insufficient infonnation to calculate an effect size. The other 
document was later found to focus much more on curriculum than on professional development. 

Moreover, several documents were rejected during the post-coding stage when effect sizes were 
calculated using non-standard formulas. One document was found to be an earlier version of 
another, more complete report. After further review, a second document did not have sample 
size data to compute an effect size. A third document utilized hierarchical linear modeling as its 
quantitative analysis, but failed to provide sufficient infonnation for the researchers to calculate a 
posttest effect size comparable with those from other documents. Finally a fourth document was 
eliminated after a series of homogeneity tests (see Appendix C) were run which showed that it 
had generated unusually large effect sizes. 

Figure 3 also notes the inter-rater reliability at each stage of the coding process. The inter-rater 
reliability rate illustrates the degree of agreement between the assigned coders. As shown, the 
inter-rater reliability ranged from .81 to .95, showing a high degree of agreement between the 
two coders. 


Results from the Coding Review 

The coding and review process and the post-coding statistical analysis yielded 16 documents of 
studies to be included in the meta analysis, with two studies covering the same initiative, the 
Northeast Front Range Math/Science Partnership (MSP). The documents (from this point 
forward will be referred to as “studies”) are listed in Table 2. Twelve studies reported on math 
professional development and student achievement effects and four studies reported on science. 
Six studies had randomized control trial or RCT designs, of which only one was in science. The 
other ten studies were conducted using a quasi-experimental design (QED) which requires 
matched treatment and comparison groups. Of those, three were on science, with the remainder 
dealing with math. Ten of the studies covered elementary grades (grades 1 through 6), seven 
covered middle grades (grades 7 and 8), and three reported results in the high school level. 

Several types of student assessment instruments were used to generate measurable results for 
students across the studies. Eleven of the sixteen studies used at least one nationally known 
assessment or statewide standardized assessment. The remaining five relied on assessments 
specific to the professional development initiative and evaluation. The Lane study used released 
test items from the Colorado Student Assessment Program (CSAP), while the Jagielski study 
used released test items released from the National Assessment of Educational Progress (NAEP). 
Although on their own these are widely known standardized assessments, the use of specific test 
items from their respective pool suggests intent by the researcher to capture a specific 
phenomenon associated with the professional development initiative. Nine kinds of criterion- 
referenced instruments were used, including state assessments — Texas Assessment of Academic 
Skills (TAAS), Colorado, and Oregon Technology Enhanced State Assessment (TESA). Six 
national norm-referenced tests were employed in the studies — Metropolitan Achievement Test 
(MAT), the Iowa Test of Basic Skills (ITBS), ACCUPLACER, Terra Nova, and the Northwest 
Education Association Assessments (NWEA). 
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Six of the 16 studies relied on assessments that were unique to the project in order to measure 
student performance. These studies had small- to medium-sized groups of teachers participating 
in the professional development program, with a range of three teachers in one study to 87 in 
another. The number of assessed students varied from 63 to 936. Two studies aggregated 
student results to the classroom level, with one having 17 classes of students and another 20 
classes. Ten of the studies utilized quasi-experimental designs (QED) that relied on comparable 
groups of teachers and students, while six studies had utilized random assignments of teachers to 
the treatment or control groups. 
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Table 2: List of Identified Studies and Key Study Characteristics 


Study 

Publication 

Status 

Study 

Design 

Content 

Area 

School Level 

Treatment 
Teachers N 
Size (All 
Teachers) 

Treatment 
Students N 
Size (All 
Students) 

Student Outcome 
Measure 

Test Type/ 

Carpenter, et 
al., 1989 

Journal 

Article 

RCT 

Math 

Elementary 

20 (40) 

20 (40, by 
class) 

ITBS (Level 7 

Interviews on number facts 
& problem 

Study-specific tests (Scale 
1,2,3) 

National/Norm- 

referenced 

PD-specific/ 

PD-specific/ 

Dickson, 2002 

Dissertation 

QED 

Science 

Middle (8 th ) & 
High (9* & 1 0 th ) 

4(8) 

86 (165) 

Texas Assessment of 
Academic Skills (TAAS) 

(8 th ) 

End-of-Course Biology Test 
(9 th & 1 0 th ) 

State/Criterion- 

referenced 

State Norm- 
referenced 

Heller et al., 
2007 

Report 

RCT 

Math 

Elementary 
(2 nd , 4 th , 6; 

48 

936 (1971) 

Math Pathways and Pitfalls 
(MPP) Pitfalls Quiz 

PD-specific 

Jagielski, 1991 

Dissertation 

QED 

Math 

Elementary 
(3 rd -6 th ), Middle 
(7 th , 8 th ) 

43 (70) 

63 (70) 

Study-specific assessment 
MCIP/89 using released 
NAEP test items 

PD-specific/Criterion- 

referenced 

Lane, 2003 

Dissertation 

QED 

Math 

Elementary 

12 (22) 

245 (490) 

Constructed CSAP 

PD-specific/Criterion- 

referenced 

META 

Associates, 

2006 

Report 

QED 

Math 

Middle (6 th , 7 th , 
8 th ) 

1 9 (34) 

495 (767) 

Colorado Student 
Assessment Program 
(CSAP) 

State/Criterion- 

referenced 

META 

Associates, 

2007 

Report 

QED 

Math 

Middle 6 th , 7 th , 
8 th ) 

1 7 (40) 

1 099 (2256) 

Student achievement as 
measured by Colorado 
Student Assessment 
Program (CSAP) 

State/Criterion- 

referenced 

Meyer & 
Sutton, 2006 

Report 

QED 

Math 

Middle (6 th , 7 th , 
8 th ) 

31 (155) 

(7813) 

Metropolitan Achievement 
Test (MAT) 

Criterion Referenced Test 

Local/Criterion- 

referenced 

Local/Criterion- 

referenced 

Niess, 2005 

Report 

RCT 

Math 

Elementary & 
Middle (3-8 th ) 

24 (42) 

310 (985) 

Technology Enhanced 
State Assessment (TESA) 

State/Criterion- 

referenced 
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Table 2 - continued 


Study 

Publication 

Status 

Study 

Design 

Content 

Area 

School Level 

Treatment 
Teachers N 
Size (All 
Teachers) 

Treatment 
Students N 
Size (All 
Students) 

Student Outcome 
Measure 

Test 

Type/Reference 

Palmer & 
Nelson, 2006* 

Report 

QED 

Science 

Elementary 
(5 th , 6 th ), Middle 
(7 th , 8 th ) & High 
(9 th , 10 (h ) 

1 6 (43) 

396 (792) 

Northwest Evaluation 
Association (NWEA) 
assessments 

National/Norm- 

referenced 

Rubin & 
Norman, 1992 

Journal 

Article 

RCT 

Science 

Middle 

7(16) 

1 08 (324) 

Middle Grades Integrated 
Process Skill Test (MIPT) 

PD-specific/Not 

reported 

Group Assessment of 
Logical Thinking Test 
(GALT) 

Nafl-Infl/Criterion- 

referenced 

Saxe, 

Gearhart, & 
Nasir, 2001 

Journal 

Article 

QED 

Math 

Elementary 

17(6) 

17(23, by 
class) 

Study-specific assessments 
(Computational Scale) 

Study-specific assessments 
(Conceptual Scale) 

PD-specific/Not 

reported 

PD-specific/Not 

reported 

Scott, 2005 

Dissertation 

QED 

Science 

Elementary 

(3 rd ) 

3(6) 

66 (100) 

Iowa Test of Basic Skills 
(ITBS) 

National/Norm- 

referenced 

Siegle & 

McCoach, 

2007 

Journal 

Article 

RCT 

Math 

Elementary 

(5 th ) 

7(15) 

430 (872) 

Math Achievement Test 

National/Norm- 

referenced 

Snippe, 1992 

Report 

RCT 

Math 

High 

87 (198) 

114 (274) 

Terra Nova 
ACC U PLACER 
WorkKeys 

National/Norm- 

referenced 

National/Norm- 

referenced 

National/Criterion- 

referenced 

Walsh- 

Cavazos, 

1994 

Dissertation 

QED 

Math 

Elementary 

(5 th ) 

4(6) 

78 (111) 

PSG Achievement 
Assessment 

PD-specific/Not 

reported 
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Reporting and Analyzing Effect Size 

An effect size (ES) is the “difference on a criterion measure between an experimental and a 
control group divided by the control group’s standard deviation” (McMillan & Schumacher, 
1997, p. 148). It provides a measure common across all the studies and gives a sense of the 
magnitude of the effect a treatment has on a dependent variable. For the CCSSO meta analysis 
study, we analyzed the effect teacher professional development has — in its various forms 
presented by the programs described in the studies — on student achievement outcomes (Lipsey 
& Wilson, 2001). 

The sixteen studies generated a sum total of 104 effect sizes. Table 3 reports several example 
effect sizes for each study as well as the range, and features the variety of effect sizes resulting 
from the many measures possible per study, including posttest only and pretest-posttest gains. 

The number of effects for each study ranged from two to 21 effects with an average of 6.5 effects 
per study. Six of the studies reported only two effect sizes. The Meyer and Sutton study 
reported ten effect sizes due to the abundance of test results generated from having three grades 
tested — grades 6 , 7 and 8 — and from two types of tests that had several constructs such as 
concept and problem-solving, math procedures, algebra, computation, data analysis, geometry 
and measurement, and numeration. The Snippe study generated 2 1 effect sizes because all three 
standardized tests — Terra Nova, ACCUPLACER, and WorkKeys — were administered to six 
different study sites. The Jagielski study produced twenty effects as a result of comparisons of 
two treatment groups to the control group across five different test questions set according to 
NAEP proficiency levels. When analyzing multiple effects, we need to consider whether the 
effects are produced from independent samples of teachers and students. Dependence among 
effect sizes can arise when data are not drawn from independent samples. 

To apply a meaning to the use of effect sizes for educators, one challenge is to translate the ES to 
something meaningful, e.g., practical effects, and Cohen ’s d statistic provides a useful guide 
(Lipsey & Wilson, 2001). Using the Cohen’s d standard guidelines for effect sizes, 56 percent of 
the effect sizes in our study are small — 0.01 to 0.2 is considered small. Twenty percent of the 
effect sizes were negative, suggesting that students of teachers who received the professional 
development treatment fared worse than their counterpart students. Nearly 8 percent of the 104 
studies are considered to have small-medium or medium effect sizes, with medium set at d = 0.5. 
Only two effect sizes, one stemming from the Saxe et al. study (ES = 2.54) and another from the 
Snippe study (ES = .79) can be considered large ES, with five other studies coming close with 
ES ranging from .68 to .78. Appendix B provides a complete and detailed listing of all the 
effect sizes generated from each study. 
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Table 3: Example Effect Sizes Reported in Studies 


Study 

Number of 
Effects 

(Total=104) 

Range of Effect 
Sizes 

Student Outcome Measure 

Effect Size 

Cohen’s d 
Standard 

Carpenter, et al., 
1989 

7 

0.11 to 0.69 

Average posttest results from Iowa Test of Basic Skills (ITBS) 

0.39 

Small 

Interviews on number facts & problem 

0.68 

Medium 

Average across Scales 1-3 of study-specific test 

0.32 

Small 

Dickson, 2002 

2 

0.10 to 0.43 

Texas Assessment of Academic Skills (TAAS) (8 tn ) 

0.10 

- 

End-of-Course Biology Test (9 m & 1 0 in ) 

0.43 

Small-medium 

Heller et al., 2007 

6 

0.27 to 0.76 

Pretest-posttest gain (4 tn ) on Math Pathways and Pitfalls (MPP) 
Pitfalls Quiz 

0.69 

Medium 

Jagielski, 1991 

20 

-0.42 to 0.78 

Average of pretest-posttest gains of both treatment groups on 
study-specific assessment-Level 250 NAEP test item 

0.77 

Medium-large 

Lane, 2003 

2 

0.08 to 0.13 

Pretest-posttest gain on Constructed CSAP 

0.13 

Small 

META Associates, 
2006 

6 

-1.52 to 0.22 

Average of pretest-posttest gains (6 th , 7 th , 8 th ) on Colorado Student 
Assessment Program (CSAP) 

0.13 

Small 

META Associates, 
2007 

2 

-0.19 to 0.11 

Pretest-posttest gain on Colorado Student Assessment Program 
(CSAP) 

-0.19 

- 

Meyer & Sutton, 
2006 

10 

-0.10 to 0.13 

Average of overall posttests (6 tn , 7 m ) in Metropolitan Achievement 
Test (MAT) 

-0.02 

- 

Overall posttest in Criterion Referenced Test 

0.10 

Small 

Niess, 2005 

4 

-0.14 to 0.37 

Pretest-posttest gain (Middle) in Technology Enhanced State 
Assessment (TESA) 

0.11 

Small 

Palmer & Nelson, 
2006 

5 

-0.21 to 0.11 

Pretest-posttest gain (Grades 3 ra , 5 m , 6 m ) in Northwest Evaluation 
Association (NWEA) assessments 

0.11 

Small 

Rubin & Norman, 
1992 

8 

-0.36 to 0.64 

Pretest-posttest gain (Treatment vs. Control II) in Middle Grades 
Integrated Process Skill Test (MIPT) 

0.64 

Medium 

Pretest-posttest gain in (Treatment vs. Control II) Group 
Assessment of Logical Thinking Test (GALT) 

0.12 

Small 

Saxe, Gearhart, & 
Nasir, 2001 

6 

-1.55 to 2.54 

Average posttest results from study-specific assessments 
(Conceptual Scale) 

1.63 

Large 

Scott, 2005 

2 

0.20 to 0.54 

Pretest-posttest gain on Iowa Test of Basic Skills (ITBS) 

0.20 

Small 

Siegle & McCoach, 
2007 

2 

0.20 to 0.22 

Cluster result on Math Achievement Test 

0.20 

Small 

Snippe, 1992 

21 

-0.43 to 0.79 

Terra Nova 

-0.01 

- 

ACCUPLACER 

0.20 

Small 

WorkKeys 

.06 


Walsh-Cavazos, 

1994 

2 

0.26 to 0.56 

Pretest-posttest gain PSG Achievement Assessment 

0.26 

Small 
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Reviewing across studies, most of the effect sizes from the 16 studies are found to be modest. 
This often stems from controlling for prior testing results from both the treatment and 
comparison groups and examining if and by how much did students taught by teachers in the 
treatment group gain relative to their respective comparison group. For example, in the Niess 
(2005) study, students of teachers who participated in the professional development activities 
associated with the High Desert Math Science Partnership (MSP) initiative did improve on the 
state’s standardized assessment (posttest ES = .13). However, after controlling for their prior 
performance on the assessment and comparing it to their counterparts whose teachers did not 
participate in the High Desert MSP initiative, the results remain positive but smaller (pretest- 
posttest gain ES = . 10). A similar difference in effects between pre-post effect size and post-test 
effect size was found in the study results from Palmer and Nelson (2006), Meta Associates 
(2006), Lane (2003), Scott (2005), and Walsh-Cavazos (1994). 

Another factor that may result in the modest effect sizes is the use of standardized assessments to 
capture student measurable outcomes as a result of the professional development initiatives. All 
the aforementioned studies used either statewide criterion-referenced assessments or nationally 
norm-referenced assessments. These tests may not be fine-tuned to capture the areas that the 
professional development initiatives are intending to impact. For example, the Lane study 
examined a professional development initiative with an objective of improving the problem- 
solving and reasoning skills of fifth grade students by deepening their teachers understanding of 
math concepts and providing them teaching strategies in problem solving and in modeling the 
use of questioning and critical thinking and new vocabulary to their students. The Colorado 
Student Assessment Program’s standardized tests may not have captured the full measure of 
student gains in as a result of the professional development the students’ teachers received. 

Looking at it another way, studies that utilized student measures that are closer to the heart of 
what the professional development is intended to impact, do report larger effect sizes. In the 
Rubin and Norman study (1992), the researchers were evaluating a professional development 
initiative which trained middle school teachers in science processing skills and ways to model 
the science processing skills to their students. The study utilized the Middle Grades Integrated 
Process Skills Test (MIPT), a lesser-known assessment that measures student proficiency in 
understanding the skills with which scientists use to explore and analyze a phenomenon. Not 
surprisingly, the study found that students whose teachers participated in the professional 
development had greater understanding of the process skills compared to their non-equivalent 
counterparts whose teachers did not receive the professional development, even after controlling 
for prior performance (MIPT ES = .63). Similar cases can be found with the interview results 
from the Carpenter et al study (1989) with an ES of .68, Saxe, Gearhart and Nasir study (2001) 
with ES of 1.63 resulting from average posttest results on the conceptual scale of their study- 
specific assessment. Jagielski utilized released NAEP items for fonnulating her study-specific 
assessment, and the test items were selected as items to measure problem solving abilities of 
students with teachers who received (or did not receive) training in the problem solving standard 
from the NCTM Curriculum and Evaluation Standards for School Mathematics. Thus, it was not 
surprising to find that the pretest-posttest gain ES for the two treatment groups was .77 on one 
test item. 
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Two studies had more than one treatment group or comparison group. The Rubin and Norman 
study involved two control groups. The treatment group of teachers received professional 
development in the use of a systematic modeling strategy to increase the scientific approach 
process skills. The treatment group is compared to a control group of teachers that received 
training on a substitute strategy, the learning cycle. The second comparison group received no 
special training. Table 3 shows that students of teachers who received training in the use of the 
systematic modeling strategy exhibited a significant positive difference in their achievement 
process than their peers whose teachers received no special training. 

The Jagielski study utilized a train- the -trainer model and thus involved two treatment groups. 

The first treatment group teachers attended problem-solving workshops at Loyola University and 
were trained by university staff. The second treatment group was composed of in-school 
colleagues recruited by teachers who attended the workshops. Both groups were compared 
against teachers who received no training. Table 3 shows that on average students of either 
treatment group exhibited a significant positive difference in their ability to understand the basic 
mathematical operations (addition, subtraction, multiplication, and division) and apply it in 
simple one-step word problems and in analyzing graphs and charts, as compared to their control 
counterparts. 


Professional Development Features 

The designs for providing professional development with teachers in the target, or treatment, 
group vary widely across the 16 studies in the meta analysis. It is possible to observe several 
patterns in the descriptive data for the set of professional development “programs” which 
typically include a combination of activities for improving teacher knowledge and skills. 

Content focus is not reported as a separate category in the table, but the content focus for 
teachers is consistently found in the descriptions of “Teacher Learning Goal.” Content focus was 
a primary selection criterion for the meta analysis, and all the programs reported here sought to 
increase content knowledge of the teachers. 

Table 4 displays the features by study and they varied considerably across the studies. First, the 
projects vary widely in time ( contact hours ) of professional development and duration (or 
overall period when implemented). Given that all of the studies reported did show positive 
effects on student achievement, we can see that there is an inconsistent pattern in the relationship 
of time and duration to effects. For example, the professional development initiatives included 
in the 16 studies are widely differing in total amount of time. One professional development 
design provided only two hours of further education for teachers, six studies reported less than 
20 hours were devoted to teacher development, and four of the designs included a combination 
of activities totaling over 100 hours of teacher development. Current research shows that 
consistent effects are found when teachers have received over 100 hours of professional 
development (Banilower et al., 2006). 
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Table 4: Professional Development Features of the Studies 


Study 

Authors, Year 

Professional 

Development 

Teacher Learning Goal 

Teachers 

Location 

PD Provider 
Agency 


Months 

Duration 

PD 

Components 

Teacher Active 
Learning 

Carpenter, et 
al„ 1989 

Cognitively Guided 
Instruction (CGI) 

First grade teachers participate in a 
4-week summer workshop to learn 
about research findings on learning 
and development of addition and 
subtraction concepts in young 
children and apply that learning in the 
classroom 

24 schools in 
Madison (Wl) 
metropolitan 
area 

Researchers/ 

Authors 

80 

4.5 

Summer institute 
Coursework 
In-service activity 
Study group 
Self-directed 

Classroom 

mentoring 

Professional 

Network 

Dickson, 2002 

Inquiry Institute 
Science 

K-1 2 teachers participate in an 
inquiry-based staff development 
program from “Immersion into 
Science” model (Loucks-Horsley et 
al, 1998) 

Suburban school 
district north 
central Texas 

School District 

24 

8 

In-service Activity 
internship 

Professional 

network 

Heller et at, 
2007 

Mathematics 
Pathways and Pitfalls 
(MPP) 

2“-, 4" 1 -, and 6 ,n - grade teachers 
received introductory training and 
practice on strategies to motivate 
students to be critical thinkers of their 
math learning through logic and 
discourse 

Five diverse 
districts across 
the U.S. 

Researchers/ 

Authors 

10 

8 

Summer institute 
In-service activity 
Internship 

Lead instruction 
Observe 

Jagielski, 1991 

Mathematics 
Curriculum 
Improvement Project 

Train-the trainer model, teachers 
receive training in problem-solving as 
recommended by the National 
Council of Teachers of Mathematics 
(NCTM) standards 

Chicago, IL 

University 

36 

8 

In-service activity 
Conference 
Study group 

Lead instruction 
Lead discussion 
Professional 
network 

Lane, 2003 

Problem-solving and 
reasoning Math 

Improve 5th grade teachers 
knowledge of math concepts, 
problem solving, questioning & 
critical thinking, and new vocabulary 

Five schools 
from the same 
school district in 
Colorado 

Researcher/ 

Author 

17 

8 

In-service activity 
Study group 

Develop 

assessment 

Observe 

META 

Associates, 

2006 

Northeast Front 
Range Math/Science 
Partnership (MSP) 

Middle school math and science 
teachers participate in 2-week 
summer institutes, follow-up 
Saturday institutes and lesson study 
to gain content and pedagogical 
knowledge in geometry, earth/space 
science, force & motion, and/or life 
science 

Five school 
districts in 
Colorado front 
range 

Four Colorado 
universities 
science/math 
faculties and one 
science museum 

120 

7.5 

Summer institute 
In-service activity 
Coaching 
Mentoring 

Lead instruction 

Observe 

Develop 

assessment 

Professional 

network 

META 

Associates, 

2007 

Northeast Front 
Range Math/Science 
Partnership (MSP) 

Same as META Associates, 2006 

Same as META 
Associates, 2006 

Same as META 
Associates, 2006 

120 

7.5 

Same as META 
Associates, 2006 

Same as META 
Associates, 2006 

Meyer & Sutton, 
2006 

Math in the Middle 
Institute Partnership 

Train and support Grades 5-8 math 
teachers in math content knowledge 
enrichment, improved instructional 
strategies, and leadership skills 

Lincoln, NE 

University of 
Nebraska- 
Lincoln; 
Education 
Service Units 

540 

16 

Summer institute 
In-service activity 
Courses 


Niess, 2005 

High Desert MSP 
Math teaching 

Increase grades 3-8 math teachers’ 
ability to teach the subject by 
enriching their content and 
pedagogical math knowledge, and 
incorporating collaborative 
techniques. 

Five school 
districts in 
central Oregon 

Oregon State 
University 

304 

8 

Summer institute 
In-service activity 

Professional 

network 

Lead instruction 
Observe 
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Table 4 - continued 


Study 

Authors, Year 

Professional 

Development 

Teacher Learning Goal 

Teachers 

Location 

PD Provider 
Agency 


Months 

Duration 

PD 

Components 

Teacher Active 
Learning 

Palmer & 
Nelson, 2006 

REC Lesson Study 
Science 

For Grades 5-12 science teachers to 
increase content knowledge- 2-week 
summer institute, improve pedagogy 
with Lesson Study, and apply new 
knowledge by designing lessons to 
present to class. 

Ten school 
districts in 
Minnesota 

MN university, 
Schwan Food, 
APEN Assoc, 
Global 
Education 
Resources 

60 

8 

Summer institute 
Study group 

Lead instruction 

Develop 

Assessment 

Observe 

Professional 

network 

Rubin & 
Norman, 1992 

Systematic Modeling 
Strategy Science 
Teaching 

Train middle school teachers in 
science process skills and modeling 
teaching strategy for teaching 
science process skills to their 
students 

Detroit, Ml 

Wayne State 
University 

30 

3 

Courses 

In-service activity 
Mentoring 


Saxe, 

Gearhart, & 
Nasir, 2001 

Integrating 
Mathematics 
Assessment (IMA) or 
Collegial Support 
(SUPP) 

IMA: Teacher learning focused on 
math concepts, understanding 
children’s math, achievement 
motivations, integrated curriculum 
focus on fractions, measurement, & 
scale. Collaboration with other 
teachers interested in reformed (vs. 
traditional) instruction. SUPP: 
Teachers receive support and 
collaborative opportunities with 
others for implementing units on 
fractions, measurement & scale 

Los Angeles 
metropolitan 
area 

Researchers/ 

Authors 

41 

8 

Summer institute 
In-service activity 
Study group 
Mentoring 
Internship 

Lead instruction 

Develop 

assessment 

Observe 

Professional 

Network 

Scott, 2005 

TEAMS Professional 
Development Model 

Build a community of professional 
learners, focus on instructional 
alignment via lesson studies, and 
established mentoring peer coaching 
through multiple activities and 
supports. 

Suburban-Urban 
district Texas 
metropolitan 
area 

School District 

168 

8 

In-service activity 
Summer institute 
Conference 
Study group 
Coaching 
Mentoring 

Professional 

network 

Lead discussion 
Classroom 
mentoring 
Observe 

Siegle & 
McCoach, 2007 

Self-Efficacy 
Teaching Strategies & 
Implementation Math 

Train 5 ln grade math teachers in self- 
efficacy teaching strategies in 3 
areas: 1) goal setting, 2) teacher 
feedback, 3) modeling followed by an 
implementation of measurement unit 
curriculum designed by the 
researchers 

Ten districts 
varying urban, 
suburban, rural 
in six states (MA, 
MD, Ml, MT, NC, 
NE) 

University of 
Connecticut 

2 

1 day 

In-service activity 
Coaching 

Lead instruction 

Professional 

network 

Snippe, 1992 

National Research 
Center for Career and 
Technical Education 
(NRCCTE) model 

Teams of career and technology 
education (CTE) and math teachers 
learn how to improve math instruction 
embedded in CTE curricula by team 
building, using curriculum maps 
aligned by math concept and CTE 
curricula, designing lesson plans that 
incorporate the NRCCTE model's 
seven elements. 

Teachers from 
several states; 
providers 
traveled to each 
location 

University of 
Minnesota 

14 

3 days 

In-service activity 
Study group 

Professional 

network 

Classroom 

mentoring 

Walsh- 

Cavazos, 1994 

Probability, Statistics, 
and Graphing (PSG) 
Module 

Teachers participate in 12 hour 
training in PSG module, involving 
manipulatives, problem-solving, and 
concept-development techniques 

South Texas 
school district 

Researcher/ 

Author 

12 

3 days 

In-service activity 






Mean 

Range 

91 hrs. 

2 - 540 hrs. 

6 mos. 

Iday - 16mos. 

3.3 activities 
1 - 6 activities 

2.1 types 
1 - 4 types 
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The professional development designs reported in the 16 studies were carried out from 1990 to 
the present. The federal legislation and regulations under NCLB encouraged states and districts 
to plan teacher development for a given teacher to include more hours over a longer duration, 
which reflects the research studies of the 1990s. The studies reporting the largest number of 
hours of development time per teacher were carried out since 2000. 

The providers of professional development in these studies are primarily from universities, and 
the researc her/ evaluator producing the study is often from the same institution. It is likely that 
having access to evaluation expertise in a university is a major advantage for providers of 
professional development, and student achievement effects is likely enhanced by the professional 
development providers being with a university. 

One key finding from Table 4 is the evidence of multiple professional development activities, 
follow-up steps with teachers in their schools, and active learning methods that were used with 
teachers. The descriptive information on the professional development provided in these 
programs that did have effects on improving student achievement show confirmation of evidence 
from prior research on the importance of continuing learning reinforcement activities after the 
initial period of teacher training or intensive knowledge development such as through a summer 
institute. These effective programs included from two to six different types of activities, 
including coaching, mentoring, internship, professional networks, and study group, in addition to 
coursework or initial in-service education. The meta analysis of studies was somewhat limited in 
being able to identify all activities that were carried out. But even so, the review procedures for 
the 16 studies produced strong evidence of active methods of teacher learning during 
professional development such as leading instruction, discussion with colleagues, observing 
other teachers and developing assessments, and professional networks. 

Another key finding revealed in Table 4 is the nature of teacher learning goals in the 
professional development designs. Each of the brief descriptions shows clearly that these 
programs focused on helping teachers improve their knowledge of how students learn in the 
specific subject area, how to teach the subject with effective strategies, and the important 
connections between the subject content and appropriate pedagogy so that students will best 
leam. It is apparent that these programs were well planned to maximize the use of time with 
teachers so that the content of the professional development could be directly translated by the 
teacher into improvements in curriculum and instruction. 

One finding from prior research was that effectiveness is improved with collective participation 
of teachers; that is, teachers are learning with others from their school or department. To 
maximize collective involvement of teachers, some designs focus on the whole school for 
teacher development — all teachers are part of the training and assistance. The set of studies in 
this analysis show mixed evidence of teachers’ collective participation in the professional 
development. Several of the studies are clearly from programs focused at school-level (e.g., 
Dickson, 2002; Lane, 2003; Scott, 2005) and did involve teachers who are teaching in the same 
context and thus are learning together. But other study descriptions indicate that teachers 
traveled off-site, enrolled, or volunteered for the intensive initial content and pedagogy training 
period, which would mean less chance of collective participation in development with their 
teaching colleagues. 
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Results from Analysis: Common Findings Across Studies 

With the total number of effect sizes identified across the 16 studies in our meta analysis we can 
examine the extent to which there are significant group differences. The results of the analysis 
of means are displayed in Table 5, separately for Mathematics and Science. Our analysis first 
categorized all the studies under mathematics or science and the method of measuring effect 
(pre-post analysis vs. post-analysis only). In the mathematics education studies that employed 
pre-post measures for detennining effect size, a total of 2 1 effect sizes were reported and the 
mean effect size was .2 1 . Among the math studies that used a post-test only method of 
measuring effects, a total of 68 effect sizes were reported and the mean effect size was .13. The 
table below summarizes the differences in means and number of studies by major research and 
measurement categories. We are focusing the analysis on mathematics. The number of effect 
sizes for science teacher professional development studies was small (pre-post: 10 effect sizes, 
post- analysis: 7 effect sizes) and the means for the effect sizes in each category were small and 
not significantly different from zero. See Appendix C for details on computation of effect sizes. 

Studies that used randomized control trials (RCT) had significantly larger effect sizes than 
studies that were based on quasi experimental designs (QED) though both sets of studies also 
showed significant heterogeneity. For the pre-post studies, the mean effect size was .27 for 
those studies using random trials as compared to a mean of. 17 for studies based on quasi 
experimental designs, which is a significant difference although the mean effect sizes are not 
substantively large (see Q values for both sets of math effects in Table 5 a). 

We also analyzed the mean effect sizes according to differences in the measures of student 
achievement that were used in the studies. Based on 15 effect sizes, the studies that used a pre- 
post test design and employed achievement measures that were aligned to the professional 
development treatment objectives (e.g., treatment focus on teaching geometric concepts and 
students are assessed on knowledge of geometric concepts) had a mean effect size of .32. Six 
effect sizes were found for studies that used statewide assessment results in mathematics as the 
outcome measure, and the mean effect size was only .01. Both of these sets of effects showed 
significant heterogeneity as well. 

For the studies that used a post-analysis only (comparing outcomes between treatment and 
control groups of teachers), four types of achievement tests were found. The mean effect size for 
the 25 effects based on a program-specific student assessment was .28, a moderate average effect 
that is educationally meaningful. The mean for 25 effects based on national norm-referenced 
assessments was .17, a statistically significant result but a smaller effect size. The mean effect 
size for 1 1 studies that used local achievement tests was .05, a statistically significant finding but 
an average indicating less educational importance. The studies that used statewide criterion- 
referenced assessments had a small mean negative effect size (-.07) indicating no average 
positive effect and there was wide variation in effect sizes across the seven studies. 
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Table 5a: Mean Effect Sizes for Teacher Professional Development Effects On Student Achievement, Mathematics Studies 


Categories 

Math Pre- 
Post Mean 
Effect Size 
(SE) 

N 

Effects 

95 % Cl 

Q statistic 

Math Post- 
Only 
Mean 

Effect Size 
(SE) 

N 

Effects 

95 % Cl 

Q statistic 

Math Studies 

0.21 (0.08) 

21 

(0.06, 0.36) 

Or = 153.72* 

0.13 (0.03) 

68 

(0.07, 0.20) 

Qr = 328.78* 

Research Desian 




Qb(1) =46.12* 




Qb(1) =66.72* 

RCT 

0.27 (0.13) 

5 

(0.01, 0.53) 

Qw = 53.24* 

0.26 (0.05) 

35 

(0.16, 0.35) 

Qw = 78.37* 

QED 

0.17(0.08) 

16 

(0.01, 0.34) 

Qw = 54.35* 

0.04 (0.04) 

33 

(-0.04, 0.11) 

Qw= 183.70* 

Measure Tvoe 




Qb(1) = 84.46 




Qb(3) = 90.43* 

PD Specific 

0.32 (0.08) 

15 

(0.16, 0.49) 

Qw = 46.81 

0.28 (0.09) 

25 

(0.10, 0.46) 

Qw = 91 -73* 

State Criterion- Referenced 

0.01 (0.08) 

6 

(-0.15, 0.16) 

Qw = 22.45 

-0.07 (0.14) 

7 

(-0.35, 0.21) 

Qw = 111.25* 

National Norm-Referenced 

- 

- 


-- 

0.17 (0.04) 

25 

(0.10, 0.24) 

Qw = 1 6.33 

Local Test 

— 

— 


— 

0.05 (0.02) 

11 

(0.02, 0.09) 

Qw = 19.05* 

N Effects = number of effect sizes per category (across studies identified with at least one significant effect size); *p < .05; if Or is significant a random-effects 
model is applied. If Qw is not significant a fixed-effects model is applied. If Qw is significant a random-effect model is used for that category. Q B refers to 


differences between groups. 


Table 5b: Mean Effect Sizes for Teacher Professional Development Effects On Student Achievement, Science Studies 


Categories 

Science Pre- 
Post Mean 
Effect Size 
(SE) 

N 

Effects 

95 % Cl 

Q statistic 

Science 

Post-Only 

Mean 

Effect Size 
(SE) 

N 

Effects 

95 % Cl 

Q statistic 

Science Studies 

0.05 (0.08) 

10 

(-0.11, 0.20) 

Or = 31.57* 

0.18 (0.24) 

7 

(-0.29, 0.64) 

Or = 84.15* 

Research Desian 




Qb(1) = 1.36 




Qb(1) =33.23* 

RCT 

0.13 (0.20) 

4 

(-0.26, 0.53) 

Qw = 24.50* 

-0.15 (0.28) 

4 

(-0.71, 0.41) 

Qw = 47.99* 

QED 

-0.02 (0.05) 

6 

(-0.12, 0.09) 

Qw = 5.71 

0.63 (0.16) 

3 

(0.32, 0.94) 

Qw = 2.94 

Measure Tvoe 




Qb(2) = 14.93* 




Qb(3) = 47.27* 

PD Specific 

0.39 (0.23) 

2 

(-0.07, 0.85) 

Qw = 5.33* 

0.12 (0.42) 

2 

(-0.71, 0.95) 

Qw = 17.41* 

State Criterion- Referenced 

- 

- 


- 

0.67 (0.16) 

2 

(0.35, 0.98) 

Qw = 2.72 

National Norm-Referenced 

-0.02 (0.05) 

6 

(-0.12, 0.09) 

Qw = 5.71 

0.54 (0.21) 

1 

(0.12, 0.96) 

- 

International 

-.013 (0.24) 

2 

(-0.59, 0.34) 

Qw = 5.61 * 

-0.42 (0.42) 

2 

(-1.24, 0.40) 

Qw = 16.75* 

N Effects = number of effect sizes per category (across studies identified with at least one significant effect size);*p < .05; if Q T is significant a random-effects 
model is applied. If Qw is not significant a fixed-effects model is applied. If Qw is significant a random-effect model is used for that category. Q B refers to 
differences between groups. 
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Professional Development Characteristics 

We also conducted further analysis to examine any differences in mean effect sizes based on the 
grade span covered by the studies and any differences according to professional development 
design characteristics (see Table 6). We found that studies that targeted the elementary grades 
had larger mean effect sizes than studies that targeted middle school or high school grades. 
Fifteen effects from studies with the pre-post analysis design that covered elementary grades had 
a statistically significant mean effect of .32. With a post-only analysis design, thirty effects 
report a statistically significant mean effect size of .27. Furthermore, studies of professional 
programs that provide mentoring for participating teachers have a negative mean effect size of 
-.19, based on ten effects. Studies of programs that offer internships for their teachers have a 
positive mean effect size of .20 for nine effects. Based on studies with pre-post analysis design 
however, programs that offer collaborative networking for participating teachers show marginal 
(ES = .01, n = 6 effects) or near zero impact. 

Studies with pre-post analyses design of programs had 15 effect sizes in which coherence was 
significant. Studies reporting two types of coherence have a mean effect size of .32 as contrasted 
to -.19 (none), .12 (one type), and -.00 (three types). Studies using a post-only analysis design 
had smaller effect sizes than those with pre-post analysis design. Post-only studies with two 
types of coherence report a consistently positive though smaller mean effect size (.14). 

According to research stemming from the Eisenhower study (Garet et ah, 1999, 2001) and 
CCSSO’s cross-state study (Blank et ah, 2007), a professional development activity or program 
is more likely to be effective if it is a) consistent with the teacher's school curriculum or learning 
goals for students and/or aligned with state or district standards for student learning or 
performance, b) congruent to the day-to-day operations of schools and teachers, and c) 
compatible with the instructional practices and knowledge needed for the teachers’ specific 
assignments. If the professional development program meets all three criteria and is aligned with 
overall policies and practices in the teacher’s school system, then the professional development 
program helps undergird a supportive environment that encourages improvement in teaching 
practices and aids in the long-term sustainability of the changed practices (Grant, Peterson, & 
Shojgreen-Downer, 1996). 
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Table 6: Mean Effect Sizes and Certain Profession Development Designs and Characteristics, Mathematics Studies 


Categories 

Math Pre- 
Post Mean 
Effect Size 
(SE) 

N 

Effects 

95 % Cl 

Q statistic 

Math Post- 
Only 

Mean Effect 
Size (SE) 

N 

Effects 

95 % Cl 

Q statistic 

Grade Span 




Qb (1) = 84.46* 




Q b (2) = 71 .24* 

Elementary 

0.32 (0.08) 

15 

(0.16, 0.49) 

Qw =46.81* 

0.27 (0.07) 

30 

(0.14, 0.41) 

Qw = 113.11* 

Middle 

0.01 (0.08) 

6 

(-0.15, 0.16) 

Qw = 22.45* 

0.03 (0.04) 

17 

(-0.04, 0.10) 

Qw = 130.75* 

High 

— 

— 


— 

0.11 (0.05) 

21 

(0.01, 0.22) 

Qw = 1 3.68 

PD Desian Components 









Receive Mentoring 


- 






Q b (1 ) = 5.24* 

Has Mentoring 

- 

- 


- 

-0.19 (0.24) 

10 

(-0.67, 0.28) 

Qw = 152.16* 

None 





0.16 (0.03) 

58 

(0.11, 0.22) 

Qw = 171.39* 

Internship 

- 

- 


- 




Q B (1) = 76.50* 

Has Internship 

- 

- 


- 

0.21 (0.19) 

9 

(-0.16, 0.58) 

Qw = 76.12* 

None 

— 

- 


— 

0.10 (0.03) 

59 

(0.04, 0.15) 

Qw = 176.17* 

Collaborative 









Network (CB) 




Q b (1) = 84.46* 

- 

- 


- 

Has CB 

0.01 (0.08) 

6 

(-0.15, 0.16) 

Qw = 22.45* 

- 

- 


- 

None 

0.32 (0.08) 

15 

(0.16, 0.49) 

Qw = 46.81* 

— 

— 


— 

Active Learning 




— 




— 

Develop Assessment or 









Review Student Work 









(DA) 

- 







Q b (1) = 1 6.10* 

Has DA 





0.16 (0.03) 

58 

(0.11, 0.21) 

Qw = 171.30* 

None 





-0.20 (0.27) 

10 

(-0.72, 0.33) 

Qw = 141.38* 

Coherence 




Qb(1) = 102.97* 




Q b (3) = 32.90* 

None 

-0.19 (0.04) 

10 

(-0.28, -0.11) 

- 

0.18 (0.04) 

10 

(0.11, 0.25) 

Qw = 9.65 

1 Type 

0.12 (0.08) 

3 

(-0.03, 0.27) 

Qw = -"14 

-0.43 (0.53) 

3 

(-1.47, 0.61) 

Qw = 81 .07* 

2 Types 

0.32 (0.08) 

15 

(0.16, 0.49) 

Qw = 46.81* 

0.14(0.03) 

53 

(0.07, 0.20) 

Qw = 201.72* 

3 Types 

-0.00 (0.12) 

2 

(-0.24, 0.24) 

Qw = 3.80 

0.23 (0.12) 

2 

(0.00, 0.46) 

Qw = 3.44 


N Effects = number of effect sizes per category (across studies identified with at least one significant effect size); *p < .05; if Q B is significant a random-effects 
model is applied. If Qw is not significant a fixed-effects model is applied. If Qw is significant a random-effect model is used for that category. Q B refers to 
differences between groups. 


CCSSO, Effects of Teacher Professional Development: 2009 


25 



Correlations of Professional Development Design Elements 

Using the Pearson’s product moment correlation statistic (r), we examined the data for any 
relationships between various elements of professional development (See Appendix D for full 
correlation table). Using a significance value of .01 (two-tail test), positive correlations were 
found among measures of time — contact hours, frequency and duration. In particular, 
statistically significant positive relationships were found to exist between total contact hours and 
frequency (r = .74), contact hours and duration (r = .83) and frequency and duration (r = .62). 
Among the types of professional development activities, statistically significant positive 
relationships exists between summer institute and contact hours (r = .577), and duration (r = 
.655), and for college courses and contact hours (r = .744) and duration (r = .596). 

These findings confirm that professional development programs that involve summer institutes 
or courses for teachers also provide extensive time (through greater frequency, longer duration 
and more contact hours). Also, we found a statistically significant positive correlation between 
frequency and having two types or ways that the professional development programs are 
promoting coherence in teacher learning (r = .794). For example, High Desert MSP and 
Northeast Front Range MSP are geared not only toward teachers who need to acquire the “highly 
qualified” status under NCLB but are also designed so that students of participant teachers can 
meet state expectations for academic perfonnance, as measured by their state assessments. Both 
of these programs provide over 100 hours for their participating teachers to learn and apply their 
learning through intensive summer institutes and follow-up activities during the school year. 

In examining relationships between specific types of professional development activities and 
their means of actively engaging participant teachers in learning, statistically significant positive 
correlations were found between 

• conference and leading a discussion (r = 1 .000) 

• summer institutes and developing assessments and reviewing student work (r = .345) 

• summer institutes and observing other teachers (r = .418) 

• study group and receive classroom mentoring (r = .579) 

• classroom mentoring and engaging in learning network (r = .796 and 

• classroom mentoring and developing assessments or reviewing student work (r = .883). 

As examples, programs such as Integrated Mathematics Assessment (Saxe, Gearhart & Nasir, 
2001) and Researchers in Every Classroom (Palmer & Nelson, 2006) are reported to actively 
engage teachers by providing them opportunities to observe other teachers and develop 
assessments or review their own students’ work in summer institutes. Programs that incorporate 
study groups such as the NRCCTE model (Snippe, 1992) and Mathematics Curriculum 
Improvement Project (Jagielski, 1991) provide their participant teachers the opportunity to be 
actively engage through classroom mentoring and being part of a professional learning network. 
The data also show that when professional programs offer classroom mentoring, they are more 
than likely to engage those teachers in developing assessments and reviewing student work 
during those mentoring moments. 
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Summary of Findings 


The CCSSO meta analysis of studies of teacher professional development programs in 
mathematics and science found that 16 studies reported significant effects of teacher 
development on improving student achievement. The evidence for the findings in the 16 studies 
were based on scientific research designs. These studies reported effect sizes for student 
achievement gains for a treatment group as compared to a control group and the studies provided 
adequate data and documentation for the CCSSO research team to compute or re-analyze effect 
sizes. The large majority (12 of 16) studies were focused on analyzing mathematics teacher 
professional development and effects on student achievement in mathematics. The mean effect 
size for mathematics studies using a pre-post design is 0.21. These results show consistent 
positive effect on gains in student achievement in mathematics from teacher professional 
development in mathematics education. The mean effect size for math studies using a posttest- 
only design is 0.13, indicating that student achievement is higher for students of teachers 
receiving professional development in math education than for students of comparable teachers 
who did not participate in professional development. Our meta-analysis identified four studies of 
professional development in science that had significant effects on student achievement. 

The results for the 16 studies with effect sizes demonstrates to the education research and policy 
communities how meta analysis can and should be used in education to provide comparisons and 
aggregations of research findings over time and across many different studies. The process of 
review and analysis employed by CCSSO involved several thousand citations, initial pre- 
screening of 400 plus documents, and intensive coding and review of 74 studies. The methods of 
identifying, coding, and quantifying data used in the study can be employed for a variety of 
objectives in education research. 

CCSSO reviewed the professional development program designs and learning goals documented 
in the 16 studies. We found several common patterns. The program designs included strong 
emphasis on teachers learning specific subject content as well as pedagogical content for how to 
teach the content to students. The implementation of professional development included 
multiple activities to provide follow-up reinforcement of learning, assistance with 
implementation, and support for teachers from mentors and colleagues in their schools. In terms 
of duration of development activities, 14 of the programs continued for six months or more. The 
mean contact time with teachers in program activities was 91 hours. 

The numbers of teachers that were involved in the programs that were analyzed and found to be 
effective varied from less than ten to more than 90. The research and evaluation for the 16 
studies employed multiple measures of student achievement and outcomes. The studies analysis 
of effects on student achievement included scales to measure learning in specific content areas 
(e.g., algebra, measurement). The use of multiple measures allowed use of different types of test 
items. A majority of the studies analyzed professional development for elementary and middle 
grades teachers. The analysis of effects showed a pattern of stronger effects for elementary level 
professional development than for middle or high school teachers. 
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Effect sizes were larger when measures of achievement were used that were specifically selected 
or developed to be aligned with the content focus of the professional development. However, the 
review of research did identify several studies with significant effects using large-scale statewide 
assessment programs. This result demonstrates to evaluators and decision-makers that 
professional development can be measured with readily available data thru annual student 
assessments. However the outcomes are not likely to appear as positive or consistent as an 
outcome measure specific to the treatment goals. Some studies that computed separate effect 
sizes by student grade, such as the Meta Associates 2006 study, showed that effects of 
professional development differed markedly by grade (e.g., posttest only results show strong 
positive effects in grade 6, negative effects in grade 7 and no effect in grade 8). Wide variation 
by grade may indicate that teachers’ fidelity of implementation of their professional learning is 
related to the curriculum, or this kind of result may indicate differences in the content covered in 
student assessment instruments by grade. 

One question that has been addressed in prior research is the effect of professional development 
on teachers and their knowledge and practice. The CCSSO meta analysis review did not include 
systematic identification or review of intervening measures of the professional development 
treatment, such as measures of gains in teacher knowledge, improvement in practices, or fidelity 
of implementation of what was learned. Several of the studies identified did report analysis of 
differences on these kinds of measures between teachers in the treatment and control groups. 
Further analysis across studies would provide stronger evidence and useful information about the 
relationship between professional learning of teachers from a specific initiative and subsequent 
improved learning by students. 

The CCSSO meta analysis results show important cross-study evidence that teacher professional 
development in mathematics does have significant positive effects on student achievement. The 
analysis results also confirm the positive relationship to student outcomes of key characteristics 
of design of professional development programs that have been documented in prior research. 
The meta-analysis process and procedures carried out by CCSSO show strong potential for 
broader use and application in judging the validity and consistency of results across a range of 
education initiatives and the evidence of outcomes from the initiatives. 

Meta-Analysis Results: How Findings Can Be Used by State Leaders 

Based on the results of the meta-analysis of findings from teacher professional development 
studies, CCSSO can state several recommendations for how the results and processes from the 
meta-analysis can be useful to researchers, evaluators, and state education leaders. 

• The meta-analysis design and procedures employed by CCSSO proved to be effective in 
identifying a set of common findings regarding effects of teacher professional development 
on student achievement, and the procedures proved useful to detennine which studies and 
their results met high standards for scientific validity and reliability. 

• A scientific research design can be efficiently employed to evaluate teacher professional 
development, and a design to measure effects of teacher development on subsequent student 
achievement should be strongly considered for each funded program for teacher and teaching 
improvement. 
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• The use of research designs involving treatment and control groups should become a regular 
practice and built into the plan and organization for professional development and other 
initiatives. 

• Measures of implementation of professional development are critical to evaluation design in 
order to document and measure activities to reinforce and extend learning for teachers in 
their school setting. 

• Multiple measures of student achievement should be included in the research design if 
possible to provide for different types of assessments of learning and analysis of subject 
content learned. 

• State and local education data systems can be accessed by providers of professional 
development and evaluators and regular statewide or district-wide assessment instruments 
can be effective measures of outcomes. 

• State leaders should ensure that data systems are structured so that data on teacher 
development initiatives can be linked to student achievement measures, and these data can be 
effective for evaluation even where individual identifiers are removed. 

• Procedures for meta analysis modeled in this study provide a consistent, quantified 
methodology for application and use in other studies, including initial identification, multiple 
coding and validation of reviews, comparison of research design with established criteria, 
and consistent procedures for effect size analysis and coding of treatment variables. 
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Appendix A: Meta Analysis Coding Form Excerpt: Scaffolded Guide for Determining Inclusion of a Document 


The document review process for the meta analysis study is aided by coding form s that coders and reconcilers complete in order to record systematically how they 
determined a document is included into the pool of studies to be analyzed. The systematic review is conducted with at least two coders in mind, with a third 
person as a reconciler. The coding forms are Excel file documents composed of multiple spreadsheets that 1) assist in determining whether a document in question 
is a candidate to be included in the analysis and 2) aid in the extraction of key data needed for the analysis. Each coder completes one form per document 
independently from his/her partner coder. The reconciler completes another similar form that combines the information from both coders by bringing their entered 
information side-by-side. The excerpt shown below is the first spreadsheet that guides a coder through the process of determining the viability of a document for 
entry into the pool of studies to analyze. At certain decision junctures, the coder is forced to consider whether the document should continue to the next round of 
reviews or should be rejected. A document could be rejected any of the decision points of the review process. 

CCSSO, 2C07 - NS= Grar:. No REC-363K09 


Coder's First Name: 





Unique Document No : 







Coded Fields 

Cudtag Dtchion 

Me. 
i See bon • 
Pwepap* No. 

Additional Instructions ror coding decision 

B Dilograpnic Information 



1. 

a Autfor 


[Text] Use APA-styie for references. Example 
Varet. E. A , 4 Methuen, s E (1991 ; Effects of tne 
earring cyae upon student ana cassrocm teacner 
pe-romance. JOiima' of Reseat fe Sc fence 
Teachtng. 28(1 ). 4 1-53 


t Date 



c. Report Title 



a cry. State: =Ltj ate- i6ffluu:n 


Stage 1 Coding - Part 1 : Relevance of Document 

2. 

was ne document pjD snea cetweer jan i *9eo 
anc Alois: 31,2:07’ 




1-Yes C-No [Binary], 

3 

Is tne oocLment "epcrtng or a study trat Mot pace 
in tne U.S or Its ter-tcres 7 




1-Yes C-No IBinary], 

4 

Does tne dccJTer. report on study flrd ngs nvowng 
K -12 stjcerts anc tne r teacne-s 7 

Specify grade evens) 



1-Yes C-No IBinary]. 

5 

a is tne document "ot a journal a coot ucok 
crapter. a tnesisidlssertatlcn or an urpubisned 
report? 

If ves. ente r document type 




1-Yes. C-No [Binaty]. Ard [Texet] iConfererce 
proceed ngs are acceplacie pry r trey satisfy an 
other pertinent conditions Ir tre coding form). 


t Does the docurert conlan an emplrtcai 
quantitative study 7 

It snoule not ce one of tne foio«ing. eriktr -nates t 
Ineigble 'or review 

- Irerature review 

- researon syninesis or meta-anaysis 

- case study 

- qualitative study 

- commentary 

- opinion paper, or 

- theoretical paper le g., presentng a hypotnes® or a 
model! 7 




1-Yes C-No [Binary]. 

6 

Does me docurrert feature In-eervlce teacher 
professional deveiopmsnt program ora set or 
p-cressiona de.eicprert aetr.ites to- reaoner®? 




1-Yes C-No [Binary]. Arswer No If a) study Is 
'paused on pre-servee teacher preparation, or D) 
study Is focused or contpenenslve reform models, 
curriculum, instructional nodes teaermg materials, 
assessments cr policies Atm little attertnr to 
professional development as a primary focus 
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Appendix A continued 


Coder's First Narre 



Unique Document No.: 




Coded Fields 

Cod lag Dtcbiun 

Pogc Mo. 
iSocbon - 

Additional Instructions for coding decision 

7. 

is tre profess onai developmert In math andi'or 
science? 

If ves. enter subject 




■Re6porses-V3lh. Science. Both], IT Both, ncofy 
Nina at CCSSO ana create arotrer coding form 
with a new unique document numberfor the 
additional sucject. Trere win ce two coding 
aratyses to reflect tre two subects covered 

8 

a Student achievement outcomes Ir 
mathematics, may Include runner serse. geometric 
concepts aigecrac coroepts, measurement cats 
anaysis, ard logical reasoning 




i-Yes C-No [Binary]. 

b Student achievement outcomes Ir science, may 
Include kroweoge n eartr scarce, ife science, and 
physical science science inquiry sms. scientific 
reascring. scarce exce-ment design, data 
interpretation and analysis, nypotre6ls testing, and 
expi3n3ton lorrr jlaton from evidence. 




i-Yes >No [Blraryp 

c. Does tne study provide at least one student 
achievement outcome n math orscence as an 
effect of in-service teacne - professoral 
development? 




1-Yes C— No [Binary], 

3 

Dees the stjdy exam ne tne effects o* n-service 
professions development on teacher outcomes 
such as kroweoge and skills, beliefs anc atttjoes, 
ana'cr instructional prootice 7 




1-Yes C-No pinary]. 

10. Eligible ■ 
assigns: 

a Descnce txlefy tie type of stjdy and tne study 
Vetnods used If rraruscrpl Is an enplrcai study 
/ote all ri3|0' components of tne study 




[text] 

Ranc 

C : r: 

Trls 

omlzea 

oiled 

RCT) 

b Was random assignment used to place participants 
into dieront studv groups 7 
CR 

If a random tzaUon procedure was net J6ed, 
panepants were paced into irterventicn groups 
using a process mat was Haphazard ana functionally 
random 7 iseeAppendx_Glossary) 




1-Yes C-No [Binary]. Ar arswe r df *1* to either of 
tnese questions leads to a categonzator of tne 
study as a ranoomtzeo controlled trial, However, 
tne fact that Haphazard assgrment was used will 
De noted in tne wte-up of tre irterventton report. 

if response is "1”. skip Q c through e and go to 
Stage 1-Part I Decision If "O', go to the next 
question below 


v 

A document has to meet one of only four types of research designs to be considered for inclusion: randomized control trials, quasi experimental designs, single 
subject design or regression discontinuity. This is to guarantee that the document captures an empirical study. 
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Appendix A continued 


Coders First Name: 



Unique Document No : 




Coded Fields 

CiKli-* Dvibiuit 

P«0t No 
iSvctsMi • 
P*f* 9 *m& No. 

Additional Instructions Tor cooing decision 

QU33I- 

Encer mental 
Design |QEO) 

c. is the 'design a ojas-expedrertal design a nr 
EITHER i) statistics cortros Tor study partopants' 
charactensCcs ie.g , teacner contert kroweoge dr 
student pre-test aonieverrer: measures i, OR 2| 
compansor groups) matched on study pa”:cpants 
craractenstcs 




1 -Yes C-No [Binary]. Ar answer of T to ether of 
tnese questions leads to a categortzator of tne 
study as a quasi-expenmentai desgr 

if response la “1", skip questions a through e 
and go to Stage i-Part 1 Decision. If '0". go to the 
next question. 

Single 
Subject 
Design (SSO) 

0 Is tne document’s study a slrgle-subect des.gn’’ 




1-ves C-No JBinary]. A slrgle suCject oe6lgn 
involves an indMdua subset whose tehavlor Is 
observed for changes associated witn tne 
intervention or remeva of treatTent 

if response la "I", skip question e and go to 
Stage 1-Part 1 Decls ion. If “0*. go to the next 
question. 

Regress ion 
Discontinuity 
Design (ROD) 

e is tne document’s study a reg'esslcn oscontnulty 
(design? 




1-Yes. C— No [Binary], A reg'esslon asconcnulty 
desgn uses pretest-posttest prograrr-cornparison 
group st'ategy. but has the unque characte'stc of 
assignng cancpants to program or oomparson 
groups based solely on a cutoff score on a pre- 
program Teasure 


Stage 1 - Part 1 Decision 




l=Pass. 0=Fail. 

tf marked "I* to all Q 2 through 8c AND one in Q 
10 the document passes Part 1. Proceed to next 
question, if not. mark as "0” to fall ana stop - 
tne document is ineligible for further review. 

Stage 1 - Part II : Outcome Measures & Methodology " 

11. 

a Descrice the stuoert ACHIEVEMENT outcome 
measures ard constvcts rerected ir tre outcore 
measures ard ne 3pprcacn to neasu'en-ent n Tacie 
ia. 




Complete Table ia. PieaBe note that we are only' 
concerned about student achievement 
outcomes Therefore, outcome measures on 
student bei lef or attitudes shou id be excluded 
from Table la. 

b Only alter passing Stage -Part i Declsor ooes 
one or more of tne student acneuenent outcome 
measures in "able 1 (1) have face vaidrty. OR |2) 
reoon reiaontv CRi’3l s a stard3rc zed test 7 




1-Yes C-No JBinary], 



The scaffolded guide spreadsheet is followed by additional spreadsheets whereby coders/reconcilers record data on a) student outcome measures and constructs 
(validity & reliability of those measures-Table la); b) teacher outcome measures and constructs (Table lb); c) number of teachers participating in the study, by 
treatment and control groups (Figure la); d) number of students participating in the study by treatment and control groups (Figure lb); e) teacher characteristics, by 
treatment and control groups (Table 2); f) characteristics of students, by treatment and control groups (Table 3); g) characteristics of the professional development 
initiative; and h) estimates of treatment effects (effect sizes), by outcome measures (Tables 5a-d). 
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Appendix A continued 


Coder's First Name 



Unique Document No : 




Coded Fields 

Codtog Otihiun 

P<rj€ No 

iSocBmmi * 
No. 

Additional Instructions for cooing decision 


c. Only alter pass. r»g Stage l-Part l Decision, are 
elec: sizes reportea n tne study? 

CR 

is mere any ir'cmaticn tnat wii allow computation of 
e*Tect sizes *or one or more of tre measures atove 7 
For exanpe — 

- Mean standard deviation (or standard error;, ard 
sampe size n eacn group 

- Cirterence in means pooled standard deviation 
(or standard error), and sample size in eacn gn>up 

- Means, sample size ano t-value (for two 
Independent groups) 

- Difference in means sample size anc p-vaiue 

- Proportion and sample size ir eacn group 

- Regression coefficients - o'dlnary east squares 
iOLSi or nierarcficai inea'rrooeing |HLM) 
standard errors degree of freedom, ard sample 
size in eaor group 7 




i-Yes C-No [Binary]. 

12. 

Descrtce tne teacner outcome measures ano 
constructs reflected In tne outcome measures ard tne 
app-oacr to Teasu r ement in Tat<e id. 




Compete Table fb. Please include ALL teacher 
outcome measures ncludlng those on teacher 
content’pedagogicai content knowledge. Dellefe. 
and attitudes. 


According to tne reconciliation, the design or tnis 
study is: 




Make sure to complete Information Tor the 
applicable design 

If RCT. go to Q 13 
IfQED. go to Q15. 

If Single-subject design, go to Q 16. 

If Regression discontinuity design, go to Q 17. 


13. (RCT 
Design only) 

a Descnce specific oetais of tne procedure of 
randomizator or a procedure tnat was naprazard 
ano functionally random 




[text]. 

b Dd tne aunors provide detais of tre 
randomizator procedure ir tre document 7 




1-Yes. C-No [Binary], =or an RCT to l/ee: 

Evidence Stardaros the study participants (e g . 
teacners classrooms or stuoerts;. should nave 
Deer placed to eacn study condition tnrougr 
random assignment or a process lh3t *36 
naprazard and 'urvotlonally rardom 
it tne assignment p-moess n an RCT Is truly random 
ortUnctonay random as aesolbec above, me RCT 

c. Have tne study participants ie.g., tea-oner or 
students; been placed to each study oordltlon 
Ihrougn random assignment or a process that was 
rapnaz3 r d and functional) raioom? 
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Appendix A continued 


Coder's FIral Not a 



Unique Document No.: 




Coded Fields 

Cirils&DieNluh 

p.rjc Me. 

- 

Mu. 

-dd Ilona Instructions Tor coding decision 


q. Is- trie BtLdy f^ee at any other p'Oblems wim 
randwittzaBan ie.g., subjects telng repaced or 
Bwtetted berwieen groups sler Initial random 
assignment) 




Meets Evidence Stands td&, unless one or more of 
the To lowing candtOora Ib vlclated: 
(lyFtandcnMm, |2j attrition. |3)1eacrer- 
imervenBon corfdLnd. and |4|. htenenfion 
disrupter. 

e Does me RC“ study haws any randomization 
problem? 




1-Yes. D-No [Binary]. Based bn the responses In 
q uaatlons a-d, res pond either "1 " or "I!'. 

14. (R.CT 
design oi ly i 

a. Describe row Its ajmon>;. addressed basal ne 
sqLlvaaree for botr sludent and teacher cala. IT 
mere are ary concerns oT IncomparaMirty, desolte 
men as well. 




CcmpletB Table 2 - teachers snaradelEUcE pdorlo 
prarasBlanal development lr addlllon, oomplele 
Table 3 - stucertE OnaraolelBlIcs pic :d thef 
teachers professional decelopnerrl. Use ’NR" In 
the major cells to mean that tha data waa nol 
reported Use "NA" to mean that the question 
and auSsequenl raspcnsa(B| are not applicable. 

b. Is Table 2 -Teacrers Characters! sb competed? 




1-Yes. D-No [Binary]. 

c. la “able 3 - Student CharaolelBUca completes? 




1-res. D-No [Binary], 

d. Is there IncorTpa^aJDllty In teacher basefce 
characteristics iral Ib NOT eorecLed (Or In the iTpact 
esthnales reported? Is sc. please describe. 




1-Yes. D-No [Binary]. Ard [Te!d|. 

e Is there Incorrpa^sibllty In shutter: base Ira 
craracteristcsiral Ib NOT eorecLed (Or In the ITpact 
estimates nepored? Is sc, please describe 




1 -res. D-No [Binary], And [Taxi], 
Ship tb- Q 1C. 

IS. (QED 
ceslgra :n y. 

a Describe row the ajthm>;. addressed basal ne 
equivalence for both student and teacher cala. IT 
mere are ary concerns crT IncomparaUllty, desolte 
men as w ell. 




Compteta Table 2 teachers' craracBersHes prtar to 
prcfesslcnal development lr addlllon, complete 
Table 3 -surfers characlelEllcs pic® ‘her 
teachers' professorial developnenl. 

t- Is Table 2 -Teacrers Characters! is competed? 




t -Yes. D-No [Binary], 

c. Ib “able 3 - Student CtiaractertsUcs ■bompi=ta&? 




1 -Yes. D-No [Binary], 
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Appendix A continued 


Coder's Flrsl Nstb 




Un ql 3 Document No,: 






Coded fib ds 

Oudi-^ Ctt-liiun 

P.i^ Me. 

i&oeEvii- 

Addltanal Instructions for coding decision 


d. Was equatng (teaches ard'orstudanlEi 
accairpltehed mrsugh malchir.g hooves creallrg or 
Identryng inlefvenUm and comparsor grojcs rfial 
■(dot" similar an s, ErateELof the ouleome measure^ 
iCrfterta Ter matching may Include some demographic 
uarlatHes) 




1-Yes. D-No [Binary], 


e. Was equatng (LeacharE aia/ar stucerts; 
accarnpllfified Brough slallstlcal acjuBtmenl 
In'rot.eE using slsdst&al procedures ;e.g.. cocarale 
adjustment In an ANCOVA) to equate groups- an 
pretest and acdress base na FncomparaHllty In toe 
iTpacs analysis? 




1-Yes. D-Nd [Binary], 


d. Da theteachBr itreatment ard coTparl&an;. grauas 
appear to da patently Incomparable a: taselre, and 
was- the heomp srsbhty unlhely to Be adequately 
a ddressed It rougn stalJstlcai acuslriard? 




1-Yes.D-Ne [Blrarfl. 

If response Is. "T, tols Is an Ir dlsat on or baseline 
equivalence problem. 


e. Da the student ilraalTen! and aampartson) groups 
appear to Pa patently incomparable a: taselre, and 
was lire IncoTparabllty unlhely to Be adequately 
ada reseed IMougn stallBtlca acuslnam? 




1-Yes. D-No [Binary], 
Sklpt-c-Qlfi.' 

16. (Single 
subject 
bBSIpr.B 
only): 

a. Was toe sample &lza one? 




1-Yes. D-No [Binary], 

t- Was a Elngte-fiubjsct design mosl appropriate or 
aeu dag roup des qr Be a beler opt on? 




1-Yes. D-No [Binary]. 

c. 'iVere iti a oEsenallon condtors stmdardlzed? 




1-Yes. D-No [Btoaryl. 

d. Was toe Eetiavo'irafAas obseved darned 
qperanlanolty? 




1-Yes. D-No [Binary]. 


e. Was toe rreasu r enam rigtily reliable’ 




1-Yes. D-No [Binary], 


r Were sufllclam repealed measures 1srer? 




1-Yes. D-No [Binary], 


g. Were die condition e inytfnkdi die study '.vas 
concucled dasc-rlced folly? 




1-Yes. D-No [Binary], 


ti. Was toene slabllty In die baseline emotion be'era 
toa treatment abb Irtrodusad? 




1-Yes. D-No [Binary]. 


1. Was there a difference between toia lengir (Time 
ernunter cf JtbservaskHts between toe caselhe nd 
toa treatment rand mens? 




1-Yes. D-No [Binary]. 


J. Was only ore raiMHe udiamged during toe 
treatment condftlar? 




1-Yes. D-No [Binary], 
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Appendix A continued 


Coder's Flral Note 




Unique Document No 






Coded Fields 

Cudi-v Dci.Liuii 

Piiyy MS. 
lS#cG£rii * 

Pd-rf.-d-.' 1 ’ Nv. 

Additional Ins t-u ot era for coding decls or 


h. Were threats 1 o Internal and ejcarral validity 
addressed? 




1 -Yes. [-No [Binary. 
S-KIptb Q 13. 

17. (For 
rBflpwslon 

a. Was tie cut-DT crtlenon folKMed tftliojt 
enoepBon? 

Descrlce avoir criterion 




1-Yes. D-Nd [Binary], Ard [Jeid!, 

sis ^ontln u-)ty 
C r 3 

or- yj: 

t Daee ITiE p r e-po5l dstrltullon (Plow a po ynomla 
runcdon? 




1-Yes. [-No [Binary], Ifthelrue pre-post 
nelaUonEr p Is IcgarliMnlc, exponential, or some 
other runctlon. ire model giver Cefcw Is unspectlled 
ard esIlTates of ire efflecl of the PD are Itety to be 
biased 


c. I0oas the comparison group have p r e1es: valance 7 




1-Yes. [-No [Binary]. There it jsl be a suHlclenl 
hl Tiber rtf pnelest values In ire coxpalEon group 
"ceraMe adequate esOmaflan aTBie tne 
nelatloiBr p (l.e.. cre-poEt regression lire) tor tlial 
group 


d. Dc :he treatment and ccmparson group came fw 
a singe continuous prerte*: flsfilbutton wHt :he 
UvlBlcn t-ahveer grcucs derem nad by ire cul-otr 
scone? 




1-Yes. E-No [Binary]. Both groups must some Item 
•a slrgle corhLOLS cretesl dlsIrbLdar win the 
dlvlElor teTA'een grains determined by the cutoff 
In some oases sne might be at«e to find Intact 
gr-sucs ;e.g. . hvo groups or pa:!er:s ffdn t.vo 
dtTe r er: geographic tocaBons} wtilcti 
serendlptoLsy divide an botie Tessare as to iTply 
some culoff Such naurairy s sconenuous graucs 
musl be used hVtir caul cr Gecause of di e greate r 
IIKelhcod Hna: If they differed nalLmliy at the outalT 
prior Loire program suer a difference could nehect 
a selection bas. 


e. Is- me PD p r ogram urlldrmly InclementeStc ell 
recipients under me same condtDwiE (duration, 
frequency. lend and seguera*)? 




1 -Yes. D-Nd [Binary], 

18. (All 
ceslgrsi 

a. Descrlrie any teacher-kitEiraitai confound 
pmnemB. 




[Text]. 


b. Dees ITie s:udy assign mane thar one raasner pet 
condition? 




1-Yes. [-No [Binary]. A teacher-lnterxentlon 
cor'cund occurs when only one teacher Is assgned 
5o eacn condtlor.iND means ITiatire r e Is Jls; one 
taasner pet cor dltlan.) 

iff rasponse Is ”1", sKIptne next tivo questions 
and go to Q IS. 


36 


CCSSO, Effects of Teacher Professional Development: 2009 


Appendix A continued 


Coder's Flrsl Name 




UnlqLB Document No.: 






Coded Fields 

Cudi-i; Dnjhluli 

P.iifj Kk> 

iSAcWril * 

P*s*jc*« Nu. 

Additional Instructions for coding dBclelors 


o. IT there Is orty one leacher per cordHsn Is there 
any evidence lhal teacher etrecls a r e negligible’ 




1-Yes. E-No [Binary]. It reap-snse le ’I" then go 
1o the next question. 


d. Dce& lha study have ary teacher-intervention 
confound problem? 




1-Yes. C-No [Binary]. Anawer ’1" ir"D" are the 
responses to the two prlo-r quHstJ ons. 

15. (ft.ll 
c esig rsj 

a Dasc-rlce ails' overaJI crdHTfeiErtal attrlBon 
problems, enter reported ty ire actiom ordelectas 
ty ire coders. 




[Text]. Complele Figure 1 by p^cdlng the total 
nL nber o' p-artlc pails ;l.e , teachers- and students! 
as m E ll as the iLimbar o J partlcpsils whc dropped 
oli o' Itie study and.'or tte anaysls. 

Mata sura to Indicate tie unt o J assgnnanl or 
aratysts, and spesry lha urlt Itself (e.g. student, 
teacher. class, or sctoDt). 


b. Is there any severe ovHraJI attrition prbBlam In 
the study, e Bier reported By the suitors dt detected 
ty ire code's? 




1-Yes. [-No [Binary]. Overall attrition is cellred as 
a raisura Ic measure lha outcome variable on a 1 the 
patclpans niUa ly asslgredlolte Intervention aid 
comparison groups, pr a slLdy beg is wnr ID" 
students Bata and ends Lp wnr 79 studeits total: 
7B'1 01 - D.79, then Hbtaact ITorr 1 .0. Attrition is 
1.D- 3.73 - Q.21 r cr 21 %). Coders will delermlne 
on a osss-by-case basis If tr ere- Is a uwere 
overall attrition problem In the study. 


c. l& there ary severe differential attrition problem 
In the study, enter reported by the authors or 
ceteoled ty the coders? 




1-Yes. [-No [Binary], cnrerenllal altrlicn refers 1o 
She shuallon In wilsh me p-a r oanage or tie arlglra! 
study sa up e retained In me Mkiw-up dais 
cal action Is suts'arilaltv cir=r=rrt norite 
imervenBon and tne conpalsoi groups. Severe 
dtfTerenfcBwi makes lha results 'if a study sjspect 
because It may comprorrlse me corps "an Illy o' me 
study groups. Coders will delermlne Dn a case-by 
case basis if Biers a a severe dhTerenllal 
attrition problem In thB study. 


d. Did the author.;si present evidence or post-attlllcri 
gujp aqjlvsence on pnelEst caLa (see nstnEllanis}? 




1-Yes. [-No [Binary]. Ttie authors did nol raped 
averal ard dfierenda attllloi, bey rr jsl preseil 
evidence o' post-attrlBon groLp equivalence on 
pretest data. Posl-attltloi group equivalence on 
pretest dsta nay Be ceiorstrated by a well- 
powered iD.Eaj tesl o' equlvaence that Is rar- 
signhlcanL or a standardized mean dRfe r erca 
bet ween arauos o' lass tnan d-3.13 
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Appendix A continued 


Cedar's Firsl Name 




Unique Oocument He- : 






Coded FiBide 

OipJi-'. o.'. iiiun 

P.iy« h+r 

l&frcbMl - 
PildSnip* N.'. 

Additional InstructlBnB ror coding dBOIslon 


s Oces the s:utiy tiah's aryatlrllicn prod lams' 11 




1-Yes. D-No [Binary], Unless reaper, Be to Q 13d 
Ib T r coda as "1" If Q ISd cr Q 13c Is T. 
Otherwise, code as "DT. 

20. (All 
dBSIprBj 

a. essence airy problem or disruption or 
contamination In Intarvantlon 




[Tsxt]. Intervention cortamhattor occurs when 
sonsbhlng raepsns ale - tie aegirnhg o' ire 
Intervention and sleets the OLtocms 'cr tna 
Irtiervenbon or ire coTpsrlscn croup. tut no: tuir. 
Descrbs ary disruptions ef ire lr:srveiUon or 
control conation. ary scnls miration of Itis 
treatment group, cr any cortamhaticir crife 
comcartson gram. 


b. Is there evdsnes afoMnuE disruption or 
InterwrUon cenls miration tiat acud have caused 
ctiEerved drTerences betwEen the htervsnrlcr and 
conlrcl groLps? 




i -Yes. [-No [Binary], 

Hid cation of problem * 1h disruption or 
contam! nation In Intervention 

21. (All 
c as-ig r s j 

s Wars the unt u* ass gnment aid aisysls 
seEcrbed: 




1-YES. [-NO [Binary], 


t essence ITis unit or group aasgrrment. 




p-sxt]. 


c. Describe the ml: b* ana ysls. 




[TSXt]. 


d Oces Ihs unit or aralysls rratcfi wth the LrM of 
assignment? 




If there is a mlsa gnnsm between mil o' 
assIgriTer: iid aralysls, clustering csrTecions 
should be rrade. Nothy Nina at CCSSO abcL: the 
line. 

22. (All 
dBSIgnBj 

a. Was mere any selous vlo aliens c J Elallslloal 
assumptions or any selous b as lr reporting cr 
findings? 




1-Yes. D-No pnary|. If nesponong as r 1" golo 
next questlar. ir "D". skip the next quastion and 
go to Q 23. 


b essence any Eelojs vlo alien o' Elallslloal 
sssjnpOons or Clas n repcrllrg orflndngs 




r«it]- 


SUM of ttie number of problems lr randomization, ba se Ins equivalence, attrition, leaoher- 
Interventlon oorifound, cr dlaruptlon of Intervention [from min Df 0 (o max of 4| 

0 


The default starting valLS Is aero (Oj. 
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Appendix A continued 


Crjder'a Firal Nots 



Unique Docurrent Nd.: 




Coded Fields 

Cudi-I. Dct-liiun 

P.l \ic I4j 

iSacbMi- 
Pi-*; iis& No. 

.Additional lnst r u;t ors for coding decision 


Stage 1 - Part II Decision 




1-Pa.se, C-Fal [Blna^. The dscuTer: passes ar d 
goes on '-cr turner ne^le#, 

IF: 

- As an RCT. I! 15; NO or ONE problem In 
randan zadon, attnuan, taacner-hter-ertlart 
oar'dund. or disruption 

- As 5 QED, :: ras HO proHerr i base he 
esukaleiee alrur, leacrer-lrrarveitloi 
odnSMid, or dtempfcni. 

- As si S5. deals i, * Tel ALL lf = condltlOr; 

- As 5i RD design, It nel ALL ITia conditions 

If tie docunsnl falls. eLoe -- the document is 
Inalldlbe ro r TLiri=r retie#/ 

Stage II Coding : Documentation of Effect Sizes and PD Features 

23. (Effect 
Blzsa] 

a. DccuTert overall and sjtgroLp meaia, stamdantl 
deviation (33), ard N size for tmtn tiealnent ar d 
comparsor groups arid the tins of neaaL torrent 
(a g. p r ele&1, psattaat folow-up teat; 

Are Tables 5a.. b. e, and c scrip eled? 




1-Yk. D-No. [Hnaryl. 

CoxplEte tate 5a to enter oLtccme Teas ures tlia: 
ara bases or continuous variables: complete 
tabe 5b to enter outcomes tial ato based on 
dlcnotomous variables .eg., percent pnsTdeney); 
complete tebe Se to nsre Eer|arrln HccfiEeig 
oorrecIlHiE anc lable 5d tor ctuste r ng comedians. 

t- Da the eftect EtzeE naed t-c be ssnculed using non 
standard 'armulas 7 




1-YeO-Ws. |Blnary]. tf Yes, set aside. Worry W na 
at CCSSO for assistance Ir csmpjUrg effect slzaa 

c. Bo reams penalr Id t Jhlsle periods orffali™ ups 
teyoidthe pcaL-tfisIT 




1-Yes.a-Ws. [Binary], 

24. (PD 
features} 

a. DdcuTertlta cfiarwSerJsflcs onte pratfe&danal 
cevetopmanl imerventlcn. 
la Table 4 completes 7 




1-Yes.a-Ws. [Binary]. Complete table 4. 

b Eased on tne IrTtoTmattcir provided on the center: 
;nd hipemertation of Tie proteaalonal deve spment. 
Is IfiareeraLgh hforms:!or 1o1Soll1s:= neplcalloi of 
ina imervenllm? 




1-YesJ-Ns. IBInary], 
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Appendix A continued 


Coder's Flral Name 



Unique Docwnenl Ho.: 




Coded Fields 

Cirfy* Diiihiiuii 

P .iifi Nc 

lS*cb£rii ■ 

Additional Instructions for coding decision 


c. I&'areire aufnai>;iorire document the evaluator 

o' Ihe InrervenUon, OF 

designer o J me ntefrentJcn, AND. 1 OR 

iTplenenlerof the Interverlar T 

If Yk. enter typeisE. 




1-YES.d-No. [Binary]. Ard [Texl], 

25. 

Was tie erect or the p'oTessona develcprerS on 
sturfert achievement confounded -*ttn tie erect or 

OLITtEUlLTI? 




1-Yes. D-Nd. [Binary]. Otten t. Is anoull to 
disentangle me effect of professoral de^alcpnsm 
ar student ,ec hi erement f w the elect o' related 
curriculum tf they are imervutc.eriln the PD actfilly 

26. 

Dc the measures 'sr student uulcones allgi tflir ire 
professional develcpTert? 




1-Yes. D-Nd [Enary|. Iflsalgnment between the 
studenl outcome neaeLres and prcfesslcnal 
development Introduces analytic csmplexrieE and 
llnls Irternretallois cf results. 


AHHsnaf Ca.nimBnas 




[Test]. Add any other information chad *1 assist In 
»cap:urlng me nature of tie study design, measures, 
oUtoome results, and.'or comer:. 

Stage IE Status 




1=CdmplHted, D=ro Ee Detsrnlned'ln Progress 
[Binary]. 


Additional information about the coding form can be found in 

http://www.ccsso.org/proiects/improving evaluation of professional development/Meta Analysis Study/ 
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Appendix B: Effects of Professional Development on Student Achievement, by Study (N = 104) 


Study (No. 
of Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 

Carpenter, 
et at, 1989 
(7) 

RCT 

ITBS (Level 7), 
Computation 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (1 st & 
2 nd ) 

.41 

Small- 

medium 

Posttest 

only 

Used adjusted 
mean 



ITBS (Level 7), 
Problems 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (1 st & 
2 nd ) 

.37 

Small 

Posttest 

only 

Used adjusted 
mean 



Interviews on 
number facts 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (1 st & 
2 nd ) 

.66 

Medium 

Posttest 

only 

Used adjusted 
mean 



Interviews on 
problem solving 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (1 SI & 
2 nd ) 

.69 

Medium 

Posttest 

only 

Used adjusted 
mean 



Study-specific 
test, Simple 
Addition & 
Subtraction 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (T' & 
2 nd ) 

.43 

Small- 

medium 

Posttest 

only 

Used adjusted 
mean 



Study-specific 
test .Complex 
Addition & 
Subtraction 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (1 SI & 
2 nd ) 

.42 

Small- 

medium 

Posttest 

only 

Used adjusted 
mean 



Study-specific 
test, Advanced 
Word Problems 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (1 st & 
2 nd ) 

.11 

Small 

Posttest 

only 

Used adjusted 
mean 

Dickson, 
2002 (2) 

QED 

Texas 

Assessment of 
Academic Skills 
(TAAS) (8 th ) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.096608 

Small 

Posttest 

only 

No 



End-of-Course 
Biology Test 

(9 th & 10 th ) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (High, 
9 th - 10 th ) 

.43029 

Small- 

medium 

Posttest 

only 

No 

Heller et 
at, 2007 
(6) 

RCT 

Math Pathways 
and Pitfalls 
(MPP) Pitfalls 
Quiz, Overall 

By 

teacher/class 

Posttest (2 nd ) 

.41065 

Small- 

medium 

Posttest 

only 

Yes 



Math Pathways 
and Pitfalls 
(MPP) Pitfalls 
Quiz, Overall 

By 

teacher/class 

Elementary 

(2 nd ) 

.41241 

Small- 

medium 

Pretest- 

Posttest 

Gain 

Yes 



Math Pathways 
and Pitfalls 
(MPP) Pitfalls 
Quiz, Overall 

By 

teacher/class 

Posttest (4 ,n ) 

0.763156 

Medium- 

large 

Posttest 

only 

Yes 
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Appendix B continued 


Study (No. 
of Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 



Math Pathways 
and Pitfalls 
(MPP) Pitfalls 
Quiz, Overall 

By 

teacher/class 

Elementary (4 ,n ) 

.685868 

Medium- 

large 

Pretest- 

Posttest 

Gain 

Yes 



Math Pathways 
and Pitfalls 
(MPP) Pitfalls 
Quiz, Overall 

By 

teacher/class 

Posttest (6' n ) 

.352674 

Small 

Posttest 

only 

Yes 



Math Pathways 
and Pitfalls 
(MPP) Pitfalls 
Quiz, Overall 

By 

teacher/class 

Elementary (6 ln ) 

.271791 

Small 

Pretest- 

Posttest 

Gain 

Yes 

Jagielski, 
1991 (20) 

QED 

Study-specific 
assessment 
MCIP/89, 
NAEP Level 
250-Question 1 

By class 

Posttest (3 ra -8 tn ) 
Treatment 1 vs. 
Control) 

.256549 

Small 

Posttest 

only 

Yes 



Study-specific 
assessment 
MCIP/89, 
NAEP Level 
250-Question 1 

By class 

Treatment 1 vs. 
Control 

.746684 

Medium- 

large 

Pretest- 

Posttest 

Gain 

Yes 



Study-specific 
assessment 
MCIP/89, 
NAEP Level 
250-Question 1 

By class 

Posttest (3 ra -8 m ) 
Treatment II vs. 
Control) 

.207456 

Small 

Posttest 

only 

Yes 



Study-specific 
assessment 
MCIP/89, 
NAEP Level 
250-Question 1 

By class 

Treatment II vs. 
Control 

.784691 

Medium- 

large 

Pretest- 

Posttest 

Gain 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
300-Question 2 

By class 

Posttest (3 ra -8 tn ) 
Treatment 1 vs. 
Control) 

.40038 

Small- 

medium 

Posttest 

only 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
300-Question 2 

By class 

Treatment 1 vs. 
Control 

.546542 

Medium 

Pretest- 

Posttest 

Gain 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
300-Question 2 

By class 

Posttest (3 ra -8 ,n ) 
Treatment II vs. 
Control) 

.057441 

Medium 

Posttest 

only 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
300-Question 2 

By class 

Treatment II vs. 
Control 

.366257 

Small 

Pretest- 

Posttest 

Gain 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
350-Question 3 

By class 

Posttest (3 ra -8 tn ) 
Treatment 1 vs. 
Control) 

.274124 

Small 

Posttest 

only 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
350-Question 3 

By class 

Treatment 1 vs. 
Control 

.20929 

Small 

Pretest- 

Posttest 

Gain 

Yes 
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Appendix B continued 


Study (No. 
of Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
350-Question 3 

By class 

Posttest (3 ra -8 m ) 
Treatment II vs. 
Control) 

.159811 

Small 

Posttest 

only 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
350-Question 3 

By class 

Treatment II vs. 
Control 

.137631 

Small 

Pretest- 

Posttest 

Gain 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
350-Question 4 

By class 

Posttest (3 ra -8 tn ) 
Treatment 1 vs. 
Control) 

.396558 

Small 

Posttest 

only 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
350-Question 4 

By class 

Treatment 1 vs. 
Control 

.252577 

Small 

Pretest- 

Posttest 

Gain 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
350-Question 4 

By class 

Posttest (3 ra -8 m ) 
Treatment II vs. 
Control) 

.259288 

Small 

Posttest 

only 

Yes 



Study-specific 

assessment 

MCIP/89, 

NAEP Level 
350-Question 4 

By class 

Treatment II vs. 
Control 

.664996 

Medium 

Pretest- 

Posttest 

Gain 

Yes 



Study-specific 
assessment 
MCIP/89, 
Question 5 

By class 

Posttest (3 ra -8 tn ) 
Treatment 1 vs. 
Control) 

.058814 

Small 

Posttest 

only 

Yes 



Study-specific 
assessment 
MCIP/89, 
Question 5 

By class 

Treatment 1 vs. 
Control 

-.42439 


Pretest- 

Posttest 

Gain 

Yes 



Study-specific 
assessment 
MCIP/89, 
Question 5 

By class 

Posttest (3 ra -8 ,n ) 
Treatment II vs. 
Control) 

-.26524 


Posttest 

only 

Yes 



Study-specific 
assessment 
MCIP/89, 
Question 5 

By class 

Treatment II vs. 
Control 

-.41516 


Pretest- 

Posttest 

Gain 

Yes 

Lane, 2003 
(2) 

QED 

Constructed 
CSAP, Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 

(Elementary) 

.08 

Small 

Posttest 

only 

No 



Constructed 
CSAP, Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Elementary 

0.126908 

Small 

Pretest- 

Posttest 

Gain 

Yes 
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Appendix B continued 


Study (No. 
of Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 

META 
Associates, 
2006 (5) 

QED 

Colorado 

Student 

Assessment 

Program 

(CSAP) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 6 th ) 

.22 

Small 

Posttest 

only 

No 



Colorado 

Student 

Assessment 

Program 

(CSAP) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 7 th ) 

-1.52 


Posttest 

only 

No 



Colorado 

Student 

Assessment 

Program 

(CSAP) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

0 

None 

Posttest 

only 

No 



Colorado 

Student 

Assessment 

Program 

(CSAP) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Grade 6 ,n 

.0864699 

Small 

Pretest- 

Posttest 

Gain 

No 



Colorado 

Student 

Assessment 

Program 

(CSAP) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Grade 7 ,n 

.1470775 

Small 

Pretest- 

Posttest 

Gain 

No 



Colorado 

Student 

Assessment 

Program 

(CSAP) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Grade 8 ,n 

.1435162 

Small 

Pretest- 

Posttest 

Gain 

No 

META 
Associates, 
2007 (2) 

QED 

Student 

achievement as 

measured by 

Colorado 

Student 

Assessment 

Program 

(CSAP), Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 2006 
(Elementary & 
Middle, 4 ,h -8 ,h ) 

.110911 

Small 

Posttest 

only 

Yes 



Student 

achievement as 

measured by 

Colorado 

Student 

Assessment 

Program 

(CSAP), Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Elementary & 
Middle, 4 ,h -8 ,h 

-.1933 


Pretest- 

Posttest 

Gain 

Yes 

Meyer & 
Sutton, 
2006 (8) 

QED 

Metropolitan 
Achievement 
Test (MAT) , 
Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Elementary, 5 th ) 

.023587 

Small 

Posttest 

only 

No 



Metropolitan 
Achievement 
Test (MAT), 
Math Concepts 
& Problem 
Solving 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 6 th ) 

.074428 

Small 

Posttest 

only 

No 
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Appendix B continued 


Study (No. 
of Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 



Metropolitan 
Achievement 
Test (MAT), 
Math 

Procedures 

By teacher 
group 
(treatment 
vs. 

comparison 

Posttest 
(Middle, 6 th ) 

.045459 

Small 

Posttest 

only 

No 



Metropolitan 
Achievement 
Test (MAT), 
Overall 

By teacher 
group 
(treatment 
vs. 

comparison 

Posttest 
(Middle, 6 th ) 

.068535 

Small 

Posttest 

only 

No 



Metropolitan 
Achievement 
Test (MAT), 
Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 7 th ) 

-.09989 


Posttest 

only 

No 



Criterion 
Referenced 
Test, Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.100606 

Small 

Posttest 

only 

No 



Criterion 
Referenced 
Test, Algebra 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.124888 

Small 

Posttest 

only 

No 



Criterion 

Referenced 

Test, 

Computation 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.027889 

Small 

Posttest 

only 

No 



Criterion 
Referenced 
Test, Data 
Analysis 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.040299 

Small 

Posttest 

only 

No 



Criterion 
Referenced 
Test, Geometry 
& Measurement 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.126806 

Small 

Posttest 

only 

No 



Criterion 

Referenced 

Test, 

Numeration 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.048704 

Small 

Posttest 

only 

No 

Niess, 
2005 (4) 

RCT 

Technology 
Enhanced State 
Assessment 
(TESA), Math 
Computation, 
Problem- 
Solving Skills 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 

(Elementary) 

.362457 

Small 

Posttest 

only 

No 



Technology 
Enhanced State 
Assessment 
(TESA), Math 
Computation, 
Problem- 
Solving Skills 

By teacher 
group 
(treatment 
vs. 

comparison) 

Elementary 

-.1393 


Pretest- 

Posttest 

Gain 

No 
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Study (No. 
of Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 



Metropolitan 
Achievement 
Test (MAT), 
Math 

Procedures 

By teacher 
group 
(treatment 
vs. 

comparison 

Posttest 
(Middle, 6 th ) 

.045459 

Small 

Posttest 

only 

No 



Metropolitan 
Achievement 
Test (MAT), 
Overall 

By teacher 
group 
(treatment 
vs. 

comparison 

Posttest 
(Middle, 6 th ) 

.068535 

Small 

Posttest 

only 

No 



Metropolitan 
Achievement 
Test (MAT), 
Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 7 th ) 

-.09989 


Posttest 

only 

No 



Criterion 
Referenced 
Test, Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.100606 

Small 

Posttest 

only 

No 



Criterion 
Referenced 
Test, Algebra 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.124888 

Small 

Posttest 

only 

No 



Criterion 

Referenced 

Test, 

Computation 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.027889 

Small 

Posttest 

only 

No 



Criterion 
Referenced 
Test, Data 
Analysis 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.040299 

Small 

Posttest 

only 

No 



Criterion 
Referenced 
Test, Geometry 
& Measurement 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.126806 

Small 

Posttest 

only 

No 



Criterion 

Referenced 

Test, 

Numeration 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle, 8 th ) 

.048704 

Small 

Posttest 

only 

No 

Niess, 
2005 (4) 

RCT 

Technology 
Enhanced State 
Assessment 
(TESA), Math 
Computation, 
Problem- 
Solving Skills 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 

(Elementary) 

.362457 

Small 

Posttest 

only 

No 



Technology 
Enhanced State 
Assessment 
(TESA), Math 
Computation, 
Problem- 
Solving Skills 

By teacher 
group 
(treatment 
vs. 

comparison) 

Elementary 

-.1393 


Pretest- 

Posttest 

Gain 

No 
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Appendix B continued 


Study (No. 
of Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 



Technology 
Enhanced State 
Assessment 
(TESA), Math 
Computation, 
Problem- 
Solving Skills 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 

(Middle) 

.128815 

Small 

Posttest 

only 

No 



Technology 
Enhanced State 
Assessment 
(TESA), Math 
Computation, 
Problem- 
Solving Skills 

By teacher 
group 
(treatment 
vs. 

comparison) 

Middle 

.105168 

Small 

Pretest- 

Posttest 

Gain 

No 

Palmer & 
Nelson, 
2006 (5) 

QED 

Northwest 

Evaluation 

Association 

(NWEA) 

assessments, 

General 

Science 

By teacher 
group 
(treatment 
vs. 

comparison) 

Elementary (3 ra , 
5 th , 6 th ) 

.11 

Small 

Pretest- 

Posttest 

Gain 

No 



Northwest 

Evaluation 

Association 

(NWEA) 

assessments, 

General 

Science 

By teacher 
group 
(treatment 
vs. 

comparison) 

Middle (7 ,n , 8 ,n ) 

.06 

Small 

Pretest- 

Posttest 

Gain 

No 



Northwest 

Evaluation 

Association 

(NWEA) 

assessments, 

General 

Science 

By teacher 
group 
(treatment 
vs. 

comparison) 

High (9 m , 10'") 

-.21 


Pretest- 

Posttest 

Gain 

No 



Northwest 

Evaluation 

Association 

(NWEA) 

assessments, 

Inquiry 

By teacher 
group 
(treatment 
vs. 

comparison) 

Elementary (3 ra , 
5 lh , 6 th ) 

-.01 


Pretest- 

Posttest 

Gain 

No 



Northwest 

Evaluation 

Association 

(NWEA) 

assessments, 

General 

Science, Inquiry 

By teacher 
group 
(treatment 
vs. 

comparison) 

High (9 tn , 10 ,n ) 

-.11 


Pretest- 

Posttest 

Gain 

No 

Rubin & 
Norman, 
1992 (8) 

RCT 

Middle Grades 
Integrated 
Process Skill 
Test (MIPT) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle 6 th -9 th , 
Treatment vs. 
Control 1) 

-.29421 


Posttest 

only 

Yes 



Middle Grades 
Integrated 
Process Skill 
Test (MIPT) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle 6 th - 
9 lh , Treatment 
vs. Control II) 

.553343 

Medium 

Posttest 

only 

Yes 



Middle Grades 
Integrated 
Process Skill 
Test (MIPT) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Middle 6 m -9 ln , 
Treatment vs. 
Control 1 

.165492 

Small 

Pretest- 

Posttest 

Gain 

Yes 
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Study 
(No. of 
Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 



Middle Grades 
Integrated 
Process Skill 
Test (MIPT) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Middle 6' n -9' n , 
Treatment vs. 
Control II 

.635319 

Medium 

Pretest- 

Posttest 

Gain 

Yes 



Group 

Assessment of 
Logical 
Thinking Test 
(GALT) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle 6 th -9 th , 
Treatment vs. 
Control 1) 

0.83405 


Posttest 

only 

Yes 



Group 

Assessment of 
Logical 
Thinking Test 
(GALT) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 
(Middle 6 th -9 th , 
Treatment vs. 
Control II) 

0 

None 

Posttest 

only 

Yes 



Group 

Assessment of 
Logical 
Thinking Test 
(GALT) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Middle 6 ln -9' n , 
Treatment vs. 
Control 1 

-.35745 

Small 

Pretest- 

Posttest 

Gain 

Yes 



Group 

Assessment of 
Logical 
Thinking Test 
(GALT) 

By teacher 
group 
(treatment 
vs. 

comparison) 

Middle 6 tn -9' n , 
Treatment vs. 
Control II 

.119162 

Small 

Pretest- 

Posttest 

Gain 

Yes 

Saxe, 
Gearhart, 
& Nasir, 
2001 (6) 

QED 

Study-specific 

assessments, 

Computational 

Scale) 

By 

teacher/class 

Posttest 
(Elementary- 
Treatment II 
vs. Control) 

-1.36 


Posttest 

only 

No 



Study-specific 

assessments, 

Computational 

Scale) 

By 

teacher/class 

Posttest 
(Elementary- 
Treatment 1 vs. 
Control) 

-.55 


Posttest 

only 

No 



Study-specific 

assessments, 

Conceptual 

Scale 

By 

teacher/class 

Posttest 
(Elementary- 
Treatment II 
vs. Control) 

.72 

Medium- 

Large 

Posttest 

only 

No 



Study-specific 

assessments, 

Conceptual 

Scale 

By 

teacher/class 

Posttest 
(Elementary- 
Treatment 1 vs. 
Control) 

2.54 

Large 

Posttest 

only 

No 



Study-specific 

assessments, 

Overall 

By 

teacher/class 

Posttest 
(Elementary- 
Treatment 1 vs. 
Control) 

-.5667 


Posttest 

only 

Yes 



Study-specific 

assessments, 

Overall 

By 

teacher/class 

Posttest 
(Elementary- 
Treatment II 
vs. Control) 

-1.5541 


Posttest 

only 

Yes 

Scott, 
2005 (2) 

QED 

Iowa Test of 
Basic Skills 
(ITBS), Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest (3 ra ) 

.542299 

Medium 

Posttest 

only 

No 



Iowa Test of 
Basic Skills 
(ITBS), Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Elementary 

(3 rd ) 

.198872 

Small 

Pretest- 

Posttest 

Gain 

No 
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Appendix B continued 


Study 
(No. of 
Effects) 

Study 

Design 

Outcome 

Measure 

Unit of 
Analysis 

Time of 

Measurement 

/Group 

Effect 

Size 

Cohen’s 

d 

Standard 

Posttest 
Only or 
Pretest- 
posttest 
Gain 

Applied 
correction for 
clustering or 
multiple 
comparisons? 

Siegle & 
McCoach, 
2007 (2) 

RCT 

Math 

Achievement 

Test 

By school 
(treatment 
vs. 

comparison) 

Posttest 
(Elementary, 
5 th , cluster) 

.1959 

Small 

Posttest 

only 

Yes 



Math 

Achievement 

Test 

By school 
(treatment 
vs. 

comparison) 

Posttest 
(Elementary, 
5 th , single site) 

.2159 

Small 

Posttest 

only 

Yes 

Snippe, 
1992 (21) 

RCT 

Terra Nova, 
Overall 

By class 

Posttest (High) 

-.01 

" 

Posttest 

only 

No 



Terra Nova 

By class 

Posttest (High, 
Site A) 

-.43 

” 

Posttest 

only 

No 



Terra Nova 

By class 

Posttest (High, 
Site B) 

.15 

Small 

Posttest 

only 

No 



Terra Nova 

By class 

Posttest (High, 
Site C) 

.01 

Small 

Posttest 

only 

No 



Terra Nova 

By class 

Posttest (High, 
Site D) 

.13 

Small 

Posttest 

only 

No 



Terra Nova 

By class 

Posttest (High, 
Site E) 

.14 

Small 

Posttest 

only 

No 



Terra Nova 

By class 

Posttest (High, 
Site F) 

.04 

Small 

Posttest 

only 

No 



ACCUPLACER, 

Overall 

By class 

Posttest (High) 

.20 

Small 

Posttest 

only 

No 



ACCUPLACER 

By class 

Posttest (High, 
Site A) 

.3 

Small 

Posttest 

only 

No 



ACCUPLACER 

By class 

Posttest (High, 
Site B 

.03 

Small 

Posttest 

only 

No 



ACCUPLACER 

By class 

Posttest (High, 
Site C 

.45 

Small- 

medium 

Posttest 

only 

No 



ACCUPLACER 

By class 

Posttest (High, 
Site D 

.14 

Small 

Posttest 

only 

No 



ACCUPLACER 

By class 

Posttest (High, 
Site E 

-.1 

” 

Posttest 

only 

No 



ACCUPLACER 

By class 

Posttest (High, 
Site F) 

.79 

Large 

Posttest 

only 

No 



WorkKeys, 

Overall 

By class 

Posttest (High) 

.06 

Small 

Posttest 

only 

No 



WorkKeys 

By class 

Posttest (High, 
Site A) 

-.34 

” 

Posttest 

only 

No 



WorkKeys 

By class 

Posttest (High, 
Site B) 

.07 

Small 

Posttest 

only 

No 



WorkKeys 

By class 

Posttest (High, 
Site C) 

.39 

Small 

Posttest 

only 

No 



WorkKeys 

By class 

Posttest (High, 
Site D) 

.48 

Small- 

medium 

Posttest 

only 

No 



WorkKeys 

By class 

Posttest (High, 
Site E) 

-.25 

" 

Posttest 

only 

No 



WorkKeys 

By class 

Posttest (High, 
Site F) 

.13 

Small 

Posttest 

only 

No 

Walsh- 
Cavazos, 
1994 (2) 

QED 

PSG 

Achievement 

Assessment, 

Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Posttest 

(Elementary) 

.556633 

Medium 

Posttest 

only 

No 



PSG 

Achievement 

Assessment, 

Overall 

By teacher 
group 
(treatment 
vs. 

comparison) 

Elementary 

.255494 

Small 

Pretest- 

Posttest 

Gain 

No 
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Appendix C: Computation of Effect Sizes, Homogeneity Tests and Q Statistic Analysis 


Several computations were carried out to produce effects sizes. For those that were computed 
using standard formulas, means, standard deviations and sample sizes were entered into pre-set 
cells in a coding form that calculated effect sizes for continuous outcome measures using 
Cohen’s d 


d = 


(Y m _ y ctrl ) 


pool 


where Y and Y c,rl represent the mean values for the treatment and control groups. S poo i was 
computed as 

s„, = Vt(«r - 1)«> 2 +(«;"' -i)(4;") ! ]/("“" +»'" - 2 ) . 


where n "‘ and n- ,rl are sample sizes and Sy‘ and s'y' 1 are the standard deviations in study i. 


The odds-ratio formula for dichotomous outcome measures: 

Pl/(1 “Pi) _ Pl/gl _ Plg2 

Pi/ (1 P2) pi/qi P291 ’ 

where pi (is the proportion of cases with the outcomes of interest in the first group) and P 2 
(proportion in the second group) and q x = 1 - p x . An odds ratio of 1 shows that the outcome 
(e.g., achieving math proficiency) under study is equally likely in both groups. 


Moreover, effect sizes were computed according to whether the study involved pretest-posttest 
comparison or only reported posttest results. For posttest only analysis, the effect size was 
computed as the standardized difference between means of the treatment group and the control 
group on the post means. Specifically, 


(Y trt — Y ctrl ) 


^ post 


pool 


where Y trt and Y represent the mean posttest values for the treatment and control groups. 


Spool was computed as 

V/ = d(«r -i)«> 2 -ixc') ! ]/<»"" +»"' -2). 


where /f " and n.' rl are the sample sizes for treatment and control group respectively in study i, 
and s'" and ,s’" r/ are the posttest standard deviations for study i. 


For pretest-versus-posttest analysis, the following formula was used to allow for an overall 
comparison between treatment and control groups, while controlling for the effects of the pretest. 
Specifically, 
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Appendix C continued 


(Y trt _ y Ctrl ^ ^ trt _ ^ Ctrl ^ 


Ctrl > 


pre _ post 


pool 


where Y m and Y c,rl represent the mean posttest values for the treatment and control groups, 
respectively, and X m and X c,rl represent the mean pretest values for the treatment and control 
group. S poo i was computed as 

- D«) i 2 + (»" - i)(^:") 2 ]/(» n " + «" -2>. 


where n\ and n. are the sample sizes for treatment and control group respectively in study i, 
and Sy‘ and ,sy' w are the posttest standard deviations for study i. 

For studies reporting multilevel analyses, effects were computed following Hedges’ suggestions 
when the interclass correlation was reported in the studies (Hedges, 2007). 

Homogeneity tests were conducted for each type of measure and subject (math posttest only, 
math pretest-posttest gains, science posttest only, and science pretest-posttest gains) to determine 
whether effects from the studied populations are similar or homogeneous. In the case of this 
meta analysis, the null hypothesis asserts that the effects represent the same population. In all 
four cases, the null hypothesis is rejected. 


in 

o 


HI 
C D 


in 

C\J 


Funnel plot with pseudo 95% confidence limits 
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Appendix C continued 


The (^-statistic or Q test was used to assess whether there is true heterogeneity (between-studies 
variability) in a meta analysis, which in turn affects the statistical model (fixed effects model or 
random-effects model) used on the meta analysis data to calculate a mean effect size. For this 
meta analysis, if the studies’ results are different due to sampling error, then a fixed-effects 
model is applied. If the studies’ results are different by more than sampling error (considered a 
heterogeneous case), then a random-effects model was applied. Significance is determined at the 
p<.05 level. For more information on the Q statistic, see Lipsey & Wilson, 2001. 

The homogeneity test for the 2 1 effects from the math pretest-posttest gains data set was 
statistically significant (2(20) = 153.71, p < .0005, 1 2 = .870) indicating that the effects do not 
represent the same population. The weighted mean effect under the random-effects model for 
these 21 data points is .210 (SE = .078), which indicated that the mean effect differed from zero 
(z = 2.70, p = .007) with a 95% confidence interval ( Cl) from .048 to .373. The funnel plot 
below illustrates the distribution of the effects, and also provides a way to gauge the presence of 
publication bias. There is some asymmetry, and points appear to be missing in the negative 
range, suggesting possible bias. 


Funnel plot with pseudo 95% confidence limits 


LU 

CO 



The homogeneity test for the 68 effects from the math posttest only data set was statistically 
significant (2(67) = 328.785, p < .0005, 7 2 = .796) indicating that the effects also do not 
represent the same population. The weighted mean effect under the random-effects model for 
these 68 data points is .132 (SE = .0455), which indicated that the mean effect differed from zero 
(z = 4.05, p < .001) with a 95% confidence interval ( Cl) from .041 to .223. The funnel plot 
below represents these findings. This plot is less suggestive of publication bias. 
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Appendix C continued 


Funnel plot with pseudo 95% confidence limits 
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The homogeneity test for the 10 effects from the science pretest-posttest gain data set was 
statistically significant (Q(9) = 31.57, p < .0005, 1 2 = .715) indicating that the effects also do not 
represent the same population. The weighted mean effect under the random-effects model for 
these 10 data points is .046 ( SE = .0838), with a 95% confidence interval ( Cl) from -.143 to .236. 
Here the mean effect does not differ from zero. The funnel plot below shows these findings, and 
is too sparse to provide a good assessment of the likelihood of publication bias. 
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Appendix C continued 


Funnel plot with pseudo 95% confidence limits 
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Appendix C continued 

The homogeneity test for the 7 effects from the science posttest only data set was statistically 
significant (2(6) = 84. 15, p < .0005, f = .929) indicating that the effects do not represent the 
same population. The weighted mean effect under the random-effects model for these 7 data 
points is . 176 (SE = .237), with a 95% confidence interval ( Cl) from -.404 to .757. Again, this 
mean does not differ from zero. The funnel plot below shows the distribution of the effects, and 
the small number of effects precludes making a good assessment of bias. 
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Appendix D: Correlation Table of Math Post-Only Professional Development Design Elements 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Time 














1. Contact Hr. 

1 













2. Frequency 

. 741 ** 

1 












3. Duration 

. 834 ** 

. 623 ** 

1 











PD Activities 














4. Summer 

. 577 ** 

. 399 ** 

. 655 ** 

1 










Institutes 














5. College 

. 744 ** 

-.171 

. 596 ** 

. 618 ** 

1 









Courses 














6. Conferences 

-.196 

.094 

.146 

-. 403 ** 

-. 249 * 

1 








7. Study Group 

-. 694 ** 

-.253 

-. 602 ** 

-. 524 ** 

-. 369 ** 

. 287 * 

1 







Active Learning 














8. Lead Discussion 

-.196 

.094 

.146 

-. 403 ** 

-. 249 * 

1 . 000 ** 

. 287 * 

1 






9. Learning Network 

-. 657 ** 

.048 

-. 601 ** 

-. 351 ** 

-. 471 ** 

. 249 * 

. 796 ** 

. 249 * 

1 





1 0. Develop Assessments 

-.138 

. 398 ** 

.135 

. 345 ** 

-. 249 * 

-.172 

.021 

-.172 

.155 

1 




1 1 . Observe Teachers 

-.154 

. 562 * 

.084 

. 418 ** 

-. 360 ** 

-. 249 * 

-. 298 * 

-. 249 * 

-.093 

. 692 ** 

1 



12. Classroom Mentoring 

-. 421 ** 

-. 571 ** 

-. 742 ** 

-. 394 ** 

-.028 

-. 347 ** 

. 579 ** 

-. 347 ** 

. 502 ** 

-. 347 ** 

-. 502 ** 

1 


Coherence 

.043 

-.161 

.106 

-. 406 ** 

-. 244 * 

.221 

.163 

.221 

-.158 

-.080 

-. 324 ** 

-.059 

1 

- 2 Types 















In two-tail test: * significant at p<.05; ** significant at p<.01 
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